  THIS IS MAINMATTER

CHAPTER I -- PYTHON BASICS
-------------------------------------------------------------------

  This chapter discusses Python capabilities that are likely to
  be used in text processing applications. For an introduction
  to Python syntax and semantics per se, readers might want to
  skip ahead to Appendix A (A Selective and Impressionistic
  Short Review of Python); Guido van Rossum's _Python Tutorial_
  at <http://python.org/doc/current/tut/tut.html> is also quite
  excellent. The focus here occupies a somewhat higher level:
  not the Python language narrowly, but also not yet specific to
  text processing.

  In Section 1.1, I look at some programming techniques that flow
  out of the Python language itself, but that are usually not
  obvious to Python beginners--and are sometimes not obvious even
  to intermediate Python programmers. The programming techniques
  that are discussed are ones that tend to be applicable to text
  processing contexts--other programming tasks are likely to have
  their own tricks and idioms that are not explicitly documented in
  this book.

  In Section 1.2, I document modules in the Python standard library
  that you will probably use in your text processing application,
  or at the very least want to keep in the back of your mind. A
  number of other Python standard library modules are far enough
  afield of text processing that you are unlikely to use them in
  this type of application. Such remaining modules are documented
  very briefly with one- or two- line descriptions. More details on
  each module can be found with Python's standard documentation.


SECTION 1 -- Techniques and Patterns
------------------------------------------------------------------------

  TOPIC -- Utilizing Higher-Order Functions in Text Processing
  --------------------------------------------------------------------

  This first topic merits a warning. It jumps feet-first into
  higher-order functions (HOFs) at a fairly sophisticated level
  and may be unfamiliar even to experienced Python programmers. Do
  not be too frightened by this first topic--you can understand the
  rest of the book without it. If the functional programming (FP)
  concepts in this topic seem unfamiliar to you, I recommend you
  jump ahead to Appendix A, especially its final section on FP
  concepts.

  In text processing, one frequently acts upon a series of chunks
  of text that are, in a sense, homogeneous.  Most often, these
  chunks are lines, delimited by newline characters--but
  sometimes other sorts of fields and blocks are relevant.
  Moreover, Python has standard functions and syntax for reading
  in lines from a file (sensitive to platform differences).
  Obviously, these chunks are not entirely homogeneous--they can
  contain varying data.  But at the level we worry about during
  processing, each chunk contains a natural parcel of instruction
  or information.

  As an example, consider an imperative style code fragment that
  selects only those lines of text that match a criterion
  'isCond()':

      #*---------- Imperative style line selection ------------#
      selected = []                 # temp list to hold matches
      fp = open(filename):
      for line in fp.readlines():   # Py2.2 -> "for line in fp:"
          if isCond(line):          # (2.2 version reads lazily)
              selected.append(line)
      del line                      # Cleanup transient variable

  There is nothing -wrong- with these few lines (see [xreadlines]
  on efficiency issues).  But it does take a few seconds to read
  through them.  In my opinion, even this small block of lines
  does not parse as a -single thought-, even though its operation
  really is such.  Also the variable 'line' is slightly
  superfluous (and it retains a value as a side effect after the
  loop and also could conceivably step on a previously defined
  value).  In FP style, we could write the simpler:

      #*---------- Functional style line selection ------------#
      selected = filter(isCond, open(filename).readlines())
      # Py2.2 -> filter(isCond, open(filename))

  In the concrete, a textual source that one frequently wants to
  process as a list of lines is a log file.  All sorts of
  applications produce log files, most typically either ones that
  cause system changes that might need to be examined or
  long-running applications that perform actions intermittently.
  For example, the PythonLabs Windows installer for Python 2.2
  produces a file called 'INSTALL.LOG' that contains a list of
  actions taken during the install.  Below is a highly abridged
  copy of this file from one of my computers:

      #------------ INSTALL.LOG sample data file --------------#
      Title: Python 2.2
      Source: C:\DOWNLOAD\PYTHON-2.2.EXE | 02-23-2002 | 01:40:54 | 7074248
      Made Dir: D:\Python22
      File Copy: D:\Python22\UNWISE.EXE | 05-24-2001 | 12:59:30 | | ...
      RegDB Key: Software\Microsoft\Windows\CurrentVersion\Uninstall\Py...
      RegDB Val: Python 2.2
      File Copy: D:\Python22\w9xpopen.exe | 12-21-2001 | 12:22:34 | | ...
      Made Dir: D:\PYTHON22\DLLs
      File Overwrite: C:\WINDOWS\SYSTEM\MSVCRT.DLL | | | | 295000 | 770c8856
      RegDB Root: 2
      RegDB Key: Software\Microsoft\Windows\CurrentVersion\App Paths\Py...
      RegDB Val: D:\PYTHON22\Python.exe
      Shell Link: C:\WINDOWS\Start Menu\Programs\Python 2.2\Uninstall Py...
      Link Info: D:\Python22\UNWISE.EXE | D:\PYTHON22 |  | 0 | 1 | 0 |
      Shell Link: C:\WINDOWS\Start Menu\Programs\Python 2.2\Python ...
      Link Info: D:\Python22\python.exe | D:\PYTHON22 | D:\PYTHON22\...

  You can see that each action recorded belongs to one of several
  types.  A processing application would presumably handle each
  type of action differently (especially since each action has
  different data fields associated with it).  It is easy enough
  to write Boolean functions that identify line types, for example:

      #*------- Boolean "predicative" functions on lines -------#
      def isFileCopy(line):
          return line[:10]=='File Copy:' # or line.startswith(...)
      def isFileOverwrite(line):
          return line[:15]=='File Overwrite:'

  The string method `"".startswith()` is less error prone than an
  initial slice for recent Python versions, but these examples
  are compatible with Python 1.5.  In a slightly more compact
  functional programming style, you can also write these like:

      #*----------- Functional style predicates ---------------#
      isRegDBRoot = lambda line: line[:11]=='RegDB Root:'
      isRegDBKey = lambda line: line[:10]=='RegDB Key:'
      isRegDBVal = lambda line: line[:10]=='RegDB Val:'

  Selecting lines of a certain type is done exactly as above:

      #*----------- Select lines that fill predicate ----------#
      lines = open(r'd:\python22\install.log').readlines()
      regroot_lines = filter(isRegDBRoot, lines)

  But if you want to select upon multiple criteria, an FP style
  can initially become cumbersome.  For example suppose you are
  interested all the "RegDB" lines; you could write a new custom
  function for this filter:

      #*--------------- Find the RegDB lines ------------------#
      def isAnyRegDB(line):
          if   line[:11]=='RegDB Root:': return 1
          elif line[:10]=='RegDB Key:':  return 1
          elif line[:10]=='RegDB Val:':  return 1
          else:                          return 0
      # For recent Pythons, line.startswith(...) is better

  Programming a custom function for each combined condition can
  produce a glut of named functions.  More importantly, each such
  custom function requires a modicum of work to write and has a
  nonzero chance of introducing a bug.  For conditions which
  should be jointly satisfied, you can either write custom
  functions or nest several filters within each other.  For
  example:

      #*------------- Filter on two line predicates -----------#
      shortline = lambda line: len(line) < 25
      short_regvals = filter(shortline, filter(isRegDBVal, lines))

  In this example, we rely on previously defined functions for the
  filter. Any error in the filters will be in either 'shortline()'
  or 'isRegDBVal()', but not independently in some third function
  'isShortRegVal()'. Such nested filters, however, are difficult to
  read--especially if more than two are involved.

  Calls to `map()` are sometimes similarly nested if several
  operations are to be performed on the same string. For a fairly
  trivial example, suppose you wished to reverse, capitalize, and
  normalize whitespace in lines of text. Creating the support
  functions is straightforward, and they could be nested in
  `map()` calls:

      #*------------ Multiple line transformations ------------#
      from string import upper, join, split
      def flip(s):
          a = list(s)
          a.reverse()
          return join(a,'')
      normalize = lambda s: join(split(s),' ')
      cap_flip_norms = map(upper, map(flip, map(normalize, lines)))

  This type of `map()` or `filter()` nest is difficult to read, and
  should be avoided. Moreover, one can sometimes be drawn into
  nesting alternating `map()` and `filter()` calls, making matters
  still worse. For example, suppose you want to perform several
  operations on each of the lines that meet several criteria. To
  avoid this trap, many programmers fall back to a more verbose
  imperative coding style that simply wraps the lists in a few
  loops and creates some temporary variables for intermediate
  results.

  Within a functional programming style, it is nonetheless possible
  to avoid the pitfall of excessive call nesting. The key to doing
  this is an intelligent selection of a few combinatorial
  -higher-order functions-. In general, a higher-order function is
  one that takes as argument or returns as result a function
  object. First-order functions just take some data as arguments
  and produce a datum as an answer (perhaps a data-structure like a
  list or dictionary). In contrast, the "inputs" and "outputs" of a
  HOF are more function objects--ones generally intended to be
  eventually called somewhere later in the program flow.

  One example of a higher-order function is a -function factory-:
  a function (or class) that returns a function, or collection of
  functions, that are somehow "configured" at the time of their
  creation.  The "Hello World" of function factories is an
  "adder" factory.  Like "Hello World," an adder factory exists
  just to show what can be done; it doesn't really -do- anything
  useful by itself.  Pretty much every explanation of function
  factories uses an example such as:

      >>> def adder_factory(n):
      ...    return lambda m, n=n: m+n
      ...
      >>> add10 = adder_factory(10)
      >>> add10
      <function <lambda> at 0x00FB0020>
      >>> add10(4)
      14
      >>> add10(20)
      30
      >>> add5 = adder_factory(5)
      >>> add5(4)
      9

  For text processing tasks, simple function factories are of
  less interest than are -combinatorial- HOFs. The idea of a
  combinatorial higher-order function is to take several (usually
  first-order) functions as arguments and return a new function
  that somehow synthesizes the operations of the argument
  functions. Below is a simple library of combinatorial
  higher-order functions that achieve surprisingly much in a
  small number of lines:

      #------------------- combinatorial.py -------------------#
      from operator import mul, add, truth
      apply_each = lambda fns, args=[]: map(apply, fns, [args]*len(fns))
      bools = lambda lst: map(truth, lst)
      bool_each = lambda fns, args=[]: bools(apply_each(fns, args))
      conjoin = lambda fns, args=[]: reduce(mul, bool_each(fns, args))
      all = lambda fns: lambda arg, fns=fns: conjoin(fns, (arg,))
      both = lambda f,g: all((f,g))
      all3 = lambda f,g,h: all((f,g,h))
      and_ = lambda f,g: lambda x, f=f, g=g: f(x) and g(x)
      disjoin = lambda fns, args=[]: reduce(add, bool_each(fns, args))
      some = lambda fns: lambda arg, fns=fns: disjoin(fns, (arg,))
      either = lambda f,g: some((f,g))
      anyof3 = lambda f,g,h: some((f,g,h))
      compose = lambda f,g: lambda x, f=f, g=g: f(g(x))
      compose3 = lambda f,g,h: lambda x, f=f, g=g, h=h: f(g(h(x)))
      ident = lambda x: x

  Even with just over a dozen lines, many of these combinatorial
  functions are merely convenience functions that wrap other more
  general ones. Let us take a look at how we can use these HOFs to
  simplify some of the earlier examples. The same names are used
  for results, so look above for comparisons:

      #----- Some examples using higher-order functions -----#
      # Don't nest filters, just produce func that does both
      short_regvals = filter(both(shortline, isRegVal), lines)

      # Don't multiply ad hoc functions, just describe need
      regroot_lines = \
          filter(some([isRegDBRoot, isRegDBKey, isRegDBVal]), lines)

      # Don't nest transformations, make one combined transform
      capFlipNorm = compose3(upper, flip, normalize)
      cap_flip_norms = map(capFlipNorm, lines)

  In the example, we bind the composed function 'capFlipNorm' for
  readability. The corresponding `map()` line expresses just the
  -single thought- of applying a common operation to all the lines.
  But the binding also illustrates some of the flexibility of
  combinatorial functions. By condensing the several operations
  previously nested in several `map()` calls, we can save the
  combined operation for reuse elsewhere in the program.

  As a rule of thumb, I recommend not using more than one
  `filter()` and one `map()` in any given line of code. If these
  "list application" functions need to nest more deeply than this,
  readability is preserved by saving results to intermediate names.
  Successive lines of such functional programming style calls
  themselves revert to a more imperative style--but a wonderful
  thing about Python is the degree to which it allows seamless
  combinations of different programming styles. For example:

      #*------ Limit nesting depth of map()/filter() ------#
      intermed = filter(niceProperty, map(someTransform, lines))
      final = map(otherTransform, intermed)

  Any nesting of successive `filter()` or `map()` calls, however,
  can be reduced to single functions using the proper combinatorial
  HOFs. Therefore, the number of procedural steps needed is pretty
  much always quite small. However, the reduction in total
  lines-of-code is offset by the lines used for giving names to
  combinatorial functions. Overall, FP style code is usually about
  one-half the length of imperative style equivalents (fewer lines
  generally mean correspondingly fewer bugs).

  A nice feature of combinatorial functions is that they can
  provide a complete Boolean algebra for functions that have not
  been called yet (the use of `operator.add` and `operator.mul` in
  'combinatorial.py' is more than accidental, in that sense). For
  example, with a collection of simple values, you might express a
  (complex) relation of multiple truth values as, for example:

      #*---------- Simple Boolean algebra of values ----------#
      satisfied = (this or that) and (foo or bar)

  In the case of text processing on chunks of text, these truth
  values are often the results of predicative functions applied
  to a chunk, for example:

      #*---------- Boolean algebra of return values ----------#
      satisfied = (thisP(s) or thatP(s)) and (fooP(s) or barP(s))

  In an expression like the above one, several predicative
  functions are applied to the same string (or other object), and
  a set of logical relations on the results are evaluated. But
  this expression is itself a logical predicate of the string. For
  naming clarity--and especially if you wish to evaluate the same
  predicate more than once--it is convenient to create an actual
  function expressing the predicate:

      #*------ Boolean algebra of composed functions ------#
      satisfiedP = both(either(thisP,thatP), either(fooP,barP))

  Using a predicative function created with combinatorial
  techniques is the same as using any other function:

      #*------ Use of a compositional Boolean function ------#
      selected = filter(satisfiedP, lines)


  EXERCISE:  More on combinatorial functions
  --------------------------------------------------------------------

  The module 'combinatorial.py' presented above provides some of
  the most commonly useful combinatorial higher-order functions.
  But there is room for enhancement in the brief example. Creating
  a personal or organization library of useful HOFs is a way to
  improve the reusability of your current text processing
  libraries.

  QUESTIONS:

  1.  Some of the functions defined in 'combinatorial.py' are
      not, strictly speaking, combinatorial.  In a precise sense,
      a combinatorial function should take one or several
      functions as arguments and return one or more function
      objects that "combine" the input arguments.  Identify which
      functions are not "strictly" combinatorial, and determine
      exactly what type of thing each one -does- return.

  2.  The functions 'both()' and 'and_()' do almost the same
      thing.  But they differ in an important, albeit subtle, way.
      'and_()', like the Python operator `and`, uses -shortcutting-
      in its evaluation.  Consider these lines:

      >>> f = lambda n: n**2 > 10
      >>> g = lambda n: 100/n > 10
      >>> and_(f,g)(5)
      1
      >>> both(f,g)(5)
      1
      >>> and_(f,g)(0)
      0
      >>> both(f,g)(0)
      Traceback (most recent call last):
      ...

      The shortcutting 'and_()' can potentially allow the first
      function to act as a "guard" for the second one.  The second
      function never gets called if the first function returns a
      false value on a given argument.

      a. Create a similarly shortcutting combinatorial 'or_()'
         function for your library.

      b. Create general shortcutting functions 'shortcut_all()'
         and 'shortcut_some()' that behave similarly to the
         functions 'all()' and 'some()', respectively.

      c. Describe some situations where nonshortcutting
         combinatorial functions like 'both()', 'all()', or
         'anyof3()' are more desirable than similar shortcutting
         functions.

  3.  The function 'ident()' would appear to be pointless, since
      it simply returns whatever value is passed to it.  In truth,
      'ident()' is an almost indispensable function for a
      combinatorial collection.  Explain the significance of
      'ident()'.

      Hint: Suppose you have a list of lines of text, where some
      of the lines may be empty strings.  What filter can you
      apply to find all the lines that start with a '#'?

  4.  The function 'not_()' might make a nice addition to a
      combinatorial library.  We could define this function as:

      >>> not_ = lambda f: lambda x, f=f: not f(x)

      Explore some situations where a 'not_()' function would aid
      combinatoric programming.

  5.  The function 'apply_each()' is used in 'combinatorial.py'
      to build some other functions.  But the utility of
      'apply_each()' is more general than its supporting role
      might suggest.  A trivial usage of 'apply_each()' might
      look something like:

      >>> apply_each(map(adder_factory, range(5)),(10,))
      [10, 11, 12, 13, 14]

      Explore some situations where 'apply_each()' simplifies
      applying multiple operations to a chunk of text.

  6.  Unlike the functions 'all()' and 'some()', the functions
      'compose()' and 'compose3()' take a fixed number of input
      functions as arguments.  Create a generalized composition
      function that takes a list of input functions, of any
      length, as an argument.

  7.  What other combinatorial higher-order functions that have
      not been discussed here are likely to prove useful in text
      processing? Consider other ways of combining first-order
      functions into useful operations, and add these to your
      library.  What are good names for these enhanced HOFs?


  TOPIC -- Specializing Python Datatypes
  --------------------------------------------------------------------

  Python comes with an excellent collection of standard
  datatypes--Appendix A discusses each built-in type. At the same
  time, an important principle of Python programming makes types
  less important than programmers coming from other languages tend
  to expect. According to Python's "principle of pervasive
  polymorphism" (my own coinage), it is more important what an
  object -does- than what it -is-. Another common way of putting
  the principle is: if it walks like a duck and quacks like a duck,
  treat it like a duck.

  Broadly, the idea behind polymorphism is letting the same
  function or operator work on things of different types. In C++ or
  Java, for example, you might use signature-based method
  overloading to let an operation apply to several types of things
  (acting differently as needed). For example:

      #------------ C++ signature-based polymorphism -----------#
      #include <stdio.h>
      class Print {
      public:
        void print(int i)    { printf("int %d\n", i); }
        void print(double d) { printf("double %f\n", d); }
        void print(float f)  { printf("float %f\n", f); }
      };
      main() {
        Print *p = new Print();
        p->print(37);      /* --> "int 37" */
        p->print(37.0);    /* --> "double 37.000000" */
      }

  The most direct Python translation of signature-based overloading
  is a function that performs type checks on its argument(s). It is
  simple to write such functions:

      #------- Python "signature-based" polymorphism -----------#
      def Print(x):
          from types import *
          if type(x) is FloatType:  print "float", x
          elif type(x) is IntType:  print "int", x
          elif type(x) is LongType: print "long", x

  Writing signature-based functions, however, is extremely
  un-Pythonic. If you find yourself performing these sorts of
  explicit type checks, you have probably not understood the
  problem you want to solve correctly! What you -should- (usually)
  be interested in is not what type 'x' is, but rather whether 'x'
  can perform the action you need it to perform (regardless what
  type of thing it is strictly).

  PYTHONIC POLYMORPHISM:

  Probably the single most common case where pervasive polymorphism
  is useful is in identifying "file-like" objects. There are many
  objects that can do things that files can do, such as those
  created with [urllib], [cStringIO], [zipfile], and by other
  means. Various objects can perform only subsets of what actual
  files can: some can read, others can write, still others can
  seek, and so on. But for many purposes, you have no need to
  exercise every "file-like" capability--it is good enough to make
  sure that a specified object has those capabilities you actually
  need.

  Here is a typical example. I have a module that uses DOM to work
  with XML documents; I would like users to be able to specify an
  XML source in any of several ways: using the name of an XML file,
  passing a file-like object that contains XML, or indicating an
  already-built DOM object to work with (built with any of several
  XML libraries). Moreover, future users of my module may get their
  XML from novel places I have not even thought of (an RDBMS, over
  sockets, etc.). By looking at what a candidate object can -do-, I
  can just utilize whichever capabilities that object -has-:

      #-------- Python capability-based polymorphism -----------#
      def toDOM(xml_src=None):
          from xml.dom import minidom
          if hasattr(xml_src, 'documentElement'):
              return xml_src    # it is already a DOM object
          elif hasattr(xml_src,'read'):
              # it is something that knows how to read data
              return minidom.parseString(xml_src.read())
          elif type(xml_src) in (StringType, UnicodeType):
              # it is a filename of an XML document
              xml = open(xml_src).read()
              return minidom.parseString(xml)
          else:
              raise ValueError, "Must be initialized with " +\
                    "filename, file-like object, or DOM object"

  Even simple-seeming numeric types have varying capabilities. As
  with other objects, you should not usually care about the
  internal representation of an object, but rather about what it
  can do. Of course, as one way to assure that an object has a
  capability, it is often appropriate to coerce it to a type using
  the built-in functions `complex()`, `dict()`, `float()`, `int()`,
  `list()`, `long()`, `str()`, `tuple()` and `unicode()`. All of
  these functions make a good effort to transform anything that
  looks a little bit like the type of thing they name into a true
  instance of it. It is usually not necessary, however, actually to
  transform values to prescribed types; again we can just check
  capabilities.

  For example, suppose that you want to remove the "least
  significant" portion of any number--perhaps because they
  represent measurements of limited accuracy. For whole
  numbers--ints or longs--you might mask out some low-order bits;
  for fractional values you might round to a given precision.
  Rather than testing value types explicitly, you can look for
  numeric capabilities. One common way to test a capability in
  Python is to -try- to do something, and catch any exceptions that
  occur (then try something else). Below is a simple example:

      #----------- Checking what numbers can do ---------------#
      def approx(x):                # int attributes require 2.2+
          if hasattr(x,'__and__'):  # supports bitwise-and
              return x & ~0x0FL
          try:                      # supports real/imag
              return (round(x.real,2)+round(x.imag,2)*1j)
          except AttributeError:
              return round(x,2)

  ENHANCED OBJECTS:

  The reason that the principle of pervasive polymorphism matters
  is because Python makes it easy to create new objects that behave
  mostly--but not exactly--like basic datatypes.  File-like
  objects were already mentioned as examples; you may or may not
  think of a file object as a datatype precisely.  But even basic
  datatypes like numbers, strings, lists, and dictionaries can be
  easily specialized and/or emulated.

  There are two details to pay attention to when emulating basic
  datatypes.  The most important matter to understand is that the
  capabilities of an object--even those utilized with syntactic
  constructs--are generally implemented by its "magic" methods,
  each named with leading and trailing double underscores.  Any
  object that has the right magic methods can act like a basic
  datatype in those contexts that use the supplied methods.  At
  heart, a basic datatype is just an object with some
  well-optimized versions of the right collection of magic
  methods.

  The second detail concerns exactly how you get at the magic
  methods--or rather, how best to make use of existing
  implementations. There is nothing stopping you from writing your
  own version of any basic datatype, except for the piddling
  details of doing so. However, there are quite a few such details,
  and the easiest way to get the functionality you want is to
  specialize an existing class. Under all non-ancient versions of
  Python, the standard library provides the pure-Python modules
  [UserDict], [UserList], and [UserString] as starting points for
  custom datatypes. You can inherit from an appropriate parent
  class and specialize (magic) methods as needed. No sample parents
  are provided for tuples, ints, floats, and the rest, however.

  Under Python 2.2 and above, a better option is available.
  "New-style" Python classes let you inherit from the underlying C
  implementations of all the Python basic datatypes. Moreover,
  these parent classes have become the self-same callable objects
  that are used to coerce types and construct objects: `int()`,
  `list()`, `unicode()`, and so on. There is a lot of arcana and
  subtle profundities that accompanies new-style classes, but you
  generally do not need to worry about these. All you need to know
  is that a class that inherits from [string] is faster than one
  that inherits from [UserString]; likewise for [list] versus
  [UserList] and [dict] versus [UserDict] (assuming your scripts
  all run on a recent enough version of Python).

  Custom datatypes, however, need not specialize full-fledged
  implementations. You are free to create classes that implement
  "just enough" of the interface of a basic datatype to be used for
  a given purpose. Of course, in practice, the reason you would
  create such custom datatypes is either because you want them to
  contain non-magic methods of their own or because you want them
  to implement the magic methods associated with multiple basic
  datatypes. For example, below is a custom datatype that can be
  passed to the prior 'approx()' function, and that also provides a
  (slightly) useful custom method:

      >>> class I:  # "Fuzzy" integer datatype
      ...     def __init__(self, i):  self.i = i
      ...     def __and__(self, i):   return self.i & i
      ...     def err_range(self):
      ...         lbound = approx(self.i)
      ...         return "Value: [%d, %d)" % (lbound, lbound+0x0F)
      ...
      >>> i1, i2 = I(29), I(20)
      >>> approx(i1), approx(i2)
      (16L, 16L)
      >>> i2.err_range()
      'Value: [16, 31)'

  Despite supporting an extra method and being able to get passed
  into the 'approx()' function, 'I' is not a very versatile
  datatype.  If you try to add, or divide, or multiply using
  "fuzzy integers," you will raise a 'TypeError'.  Since there
  is no module called [UserInt], under an older Python version
  you would need to implement every needed magic method yourself.

  Using new-style classes in Python 2.2+, you could derive a
  "fuzzy integer" from the underlying 'int' datatype.  A partial
  implementation could look like:

      >>> class I2(int):    # New-style fuzzy integer
      ...     def __add__(self, j):
      ...         vals = map(int, [approx(self), approx(j)])
      ...         k = int.__add__(*vals)
      ...         return I2(int.__add__(k, 0x0F))
      ...     def err_range(self):
      ...         lbound = approx(self)
      ...         return "Value: [%d, %d)" %(lbound,lbound+0x0F)
      ...
      >>> i1, i2 = I2(29), I2(20)
      >>> print "i1 =", i1.err_range(),": i2 =", i2.err_range()
      i1 = Value: [16, 31) : i2 = Value: [16, 31)
      >>> i3 = i1 + i2
      >>> print i3, type(i3)
      47 <class '__main__.I2'>

  Since the new-style class 'int' already supports bitwise-and,
  there is no need to implement it again. With new-style classes,
  you refer to data values directly with 'self', rather than as an
  attribute that holds the data (e.g., 'self.i' in class 'I'). As
  well, it is generally unsafe to use syntactic operators within
  magic methods that define their operation; for example, I utilize
  the '.__add__()' method of the parent 'int' rather than the '+'
  operator in the 'I2.__add__()' method.

  In practice, you are less likely to want to create number-like
  datatypes than you are to emulate container types. But it is
  worth understanding just how and why even plain integers are a
  fuzzy concept in Python (the fuzziness of the concepts is of a
  different sort than the fuzziness of 'I2' integers, though).
  Even a function that operates on whole numbers need not operate
  on objects of 'IntType' or 'LongType'--just on an object that
  satisfies the desired protocols.


  TOPIC -- Base Classes for Datatypes
  --------------------------------------------------------------------

  There are several magic methods that are often useful to define
  for -any- custom datatype.  In fact, these methods are useful
  even for classes that do not really define datatypes (in some
  sense, every object is a datatype since it can contain
  attribute values, but not every object supports special
  syntax such as arithmetic operators and indexing).  Not quite
  every magic method that you can define is documented in this
  book, but most are under the parent datatype each is most
  relevant to.  Moreover, each new version of Python has
  introduced a few additional magic methods; those covered
  either have been around for a few versions or are particularly
  important.

  In documenting class methods of base classes, the same general
  conventions are used as for documenting module functions.  The
  one special convention for these base class methods is the use
  of 'self' as the first argument to all methods.  Since the name
  'self' is purely arbitrary, this convention is less special
  than it might appear.  For example, both of the following uses
  of 'self' are equally legal:

      >>> import string
      >>> self = 'spam'
      >>> object.__repr__(self)
      '<str object at 0x12c0a0>'
      >>> string.upper(self)
      'SPAM'

  However, there is usually little reason to use class methods in
  place of perfectly good built-in and module functions with the
  same purpose.  Normally, these methods of datatype classes are
  used only in child classes that override the base classes, as
  in:

      >>> class UpperObject(object):
      ...       def __repr__(self):
      ...           return object.__repr__(self).upper()
      ...
      >>> uo = UpperObject()
      >>> print uo
      <__MAIN__.UPPEROBJECT OBJECT AT 0X1C2C6C>


  =================================================================
    BUILTIN -- object : Ancestor class for new-style datatypes
  =================================================================

  Under Python 2.2+, 'object' has become a base for new-style
  classes.  Inheriting from 'object' enables a custom class to
  use a few new capabilities, such as slots and properties.  But
  usually if you are interested in creating a custom datatype, it
  is better to inherit from a child of 'object', such as 'list',
  'float', or 'dict'.

  METHODS:

  object.__eq__(self, other)
      Return a Boolean comparison between 'self' and 'other'.
      Determines how a datatype responds to the '==' operator.
      The parent class 'object' does not implement '.__eq__()'
      since by default object equality means the same thing as
      identity (the 'is' operator).  A child is free to
      implement this in order to affect comparisons.

  object.__ne__(self, other)
      Return a Boolean comparison between 'self' and 'other'.
      Determines how a datatype responds to the '!=' and '<>'
      operators. The parent class 'object' does not implement
      '.__ne__()' since by default object inequality means the
      same thing as nonidentity (the 'is not' operator).
      Although it might seem that equality and inequality always
      return opposite values, the methods are not explicitly
      defined in terms of each other.  You could force the
      relationship with:

      >>> class EQ(object):
      ...     # Abstract parent class for equality classes
      ...     def __eq__(self, o): return not self <> o
      ...     def __ne__(self, o): return not self == o
      ...
      >>> class Comparable(EQ):
      ...     # By def'ing inequlty, get equlty (or vice versa)
      ...     def __ne__(self, other):
      ...         return someComplexComparison(self, other)

  object.__nonzero__(self)
      Return a Boolean value for an object.  Determines how a
      datatype responds to the Boolean comparisons 'or', 'and',
      and 'not', and to 'if' and 'filter(None,...)' tests.  An
      object whose '.__nonzero__()' method returns a true value
      is itself treated as a true value.

  object.__len__(self)
  len(object)
      Return an integer representing the "length" of the object.
      For collection types, this is fairly straightforward--how
      many objects are in the collection?  Custom types may
      change the behavior to some other meaningful value.

  object.__repr__(self)
  repr(object)
  object.__str__(self)
  str(object)
      Return a string representation of the object 'self'.
      Determines how a datatype responds to the `repr()` and
      `str()` built-in functions, to the 'print' keyword, and to the
      back-tick operator.

      Where feasible, it is desirable to have the '.__repr__()'
      method return a representation with sufficient information
      in it to reconstruct an identical object.  The goal here is
      to fulfill the equality 'obj==eval(repr(obj))'.  In many
      cases, however, you cannot encode sufficient information in
      a string, and the 'repr()' of an object is either identical
      to, or slightly more detailed than, the 'str()'
      representation of the same object.

      SEE ALSO, [repr], [operator]


  =================================================================
    BUILTIN -- file : New-style base class for file objects
  =================================================================

  Under Python 2.2+, it is possible to create a custom file-like
  object by inheriting from the built-in class 'file'.  In older
  Python versions you may only create file-like objects by
  defining the methods that define an object as "file-like."
  However, even in recent versions of Python, inheritance from
  'file' buys you little--if the data contents come from
  somewhere other than a native filesystem, you will have to
  reimplement every method you wish to support.

  Even more than for other object types, what makes an object
  file-like is a fuzzy concept.  Depending on your purpose you
  may be happy with an object that can only read, or one that can
  only write.  You may need to seek within the object, or you may
  be happy with a linear stream.  In general, however, file-like
  objects are expected to read and write strings.  Custom classes
  only need implement those methods that are meaningful to them
  and should only be used in contexts where their capabilities
  are sufficient.

  In documenting the methods of file-like objects, I adopt a
  slightly different convention than for other built-in types.
  Since actually inheriting from 'file' is unusual, I use the
  capitalized name 'FILE' to indicate a general file-like object.
  Instances of the actual 'file' class are examples (and
  implement all the methods named), but other types of objects
  can be equally good 'FILE' instances.

  BUILT-IN FUNCTIONS:

  open(fname [,mode [,buffering]])
  file(fname [,mode [,buffering]])
      Return a file object that attaches to the filename 'fname'.
      The optional argument 'mode' describes the capabilities and
      access style of the object.  An 'r' mode is for reading;
      'w' for writing (truncating any existing content); 'a'
      for appending (writing to the end).  Each of these modes
      may also have the binary flag 'b' for platforms like
      Windows that distinguish text and binary files.  The flag
      '+' may be used to allow both reading and writing.  The
      argument 'buffering' may be 0 for none, 1 for line-oriented,
      a larger integer for number of bytes.

      >>> open('tmp','w').write('spam and eggs\n')
      >>> print open('tmp','r').read(),
      spam and eggs
      >>> open('tmp','w').write('this and that\n')
      >>> print open('tmp','r').read(),
      this and that
      >>> open('tmp','a').write('something else\n')
      >>> print open('tmp','r').read(),
      this and that
      something else

  METHODS AND ATTRIBUTES:

  FILE.close()
      Close a file object.  Reading and writing are disallowed
      after a file is closed.

  FILE.closed
      Return a Boolean value indicating whether the file has been
      closed.

  FILE.fileno()
      Return a file descriptor number for the file.  File-like
      objects that do not attach to actual files should not
      implement this method.

  FILE.flush()
      Write any pending data to the underlying file.  File-like
      objects that do not cache data can still implement this
      method as 'pass'.

  FILE.isatty()
      Return a Boolean value indicating whether the file is a
      TTY-like device.  The standard documentation says that
      file-like objects that do not attach to actual files should
      not implement this method, but implementing it to always
      return '0' is probably a better approach.

  FILE.mode
      Attribute containing the mode of the file, normally
      identical to the 'mode' argument passed to the object's
      initializer.

  FILE.name
      The name of the file.  For file-like objects without a
      filesystem name, some string identifying the object should
      be put into this attribute.

  FILE.read([size=sys.maxint])
      Return a string containing up to 'size' bytes of content
      from the file.  Stop the read if an EOF is encountered or
      upon other condition that makes sense for the object type.
      Move the file position forward immediately past the read in
      bytes.  A negative 'size' argument is treated as the
      default value.

  FILE.readline([size=sys.maxint])
      Return a string containing one line from the file,
      including the trailing newline, if any.  A maximum of
      'size' bytes are read.  The file position is moved forward
      past the read. A negative 'size' argument is treated as the
      default value.

  FILE.readlines([size=sys.maxint])
      Return a list of lines from the file, each line including
      its trailing newline.  If the argument 'size' is given,
      limit the read to -approximately- 'size' bytes worth of
      lines.  The file position is moved forward past the read in
      bytes. A negative 'size' argument is treated as the
      default value.

  FILE.seek(offset [,whence=0])
      Move the file position by 'offset' bytes (positive or
      negative).  The argument 'whence' specifies where the
      initial file position is prior to the move:  0 for BOF; 1
      for current position; 2 for EOF.

  FILE.tell()
      Return the current file position.

  FILE.truncate([size=0])
      Truncate the file contents (it become 'size' length).

  FILE.write(s)
      Write the string 's' to the file, starting at the current
      file position.  The file position is moved forward past the
      written bytes.

  FILE.writelines(lines)
      Write the lines in the sequence 'lines' to the file.  No
      newlines are added during the write.  The file position is
      moved forward past the written bytes.

  FILE.xreadlines()
      Memory-efficient iterator over lines in a file.  In Python
      2.2+, you might implement this as a generator that returns
      one line per each 'yield'.

      SEE ALSO, [xreadlines]

  =================================================================
    BUILTIN -- int : New-style base class for integer objects

  =================================================================
    BUILTIN -- long : New-style base class for long integers
  =================================================================

  In Python, there are two standard datatypes for representing
  integers.  Objects of type 'IntType' have a fixed range that
  depends on the underlying platform--usually between plus and
  minus 2**31.  Objects of type 'LongType' are unbounded in size.
  In Python 2.2+, operations on integers that exceed the range of
  an 'int' object results in automatic promotion to 'long'
  objects.  However, no operation on a 'long' will demote the
  result back to an 'int' object (even if the result is of small
  magnitude)--with the exception of the `int()` function, of
  course.

  From a user point of view ints and longs provide exactly the same
  interface. The difference between them is only in underlying
  implementation, with ints typically being significantly faster to
  operate on (since they use raw CPU instructions fairly directly).
  Most of the magic methods integers have are shared by floating
  point numbers as well and are discussed below. For example,
  consult the discussion of `float.__mul__()` for information on
  the corresponding `int.__mul__()` method. The special capability
  that integers have over floating point numbers is their ability
  to perform bitwise operations.

  Under Python 2.2+, you may create a custom datatype that
  inherits from 'int' or 'long'; under earlier versions, you
  would need to manually define all the magic methods you wished
  to utilize (generally a lot of work, and probably not worth
  it).

  Each binary bit operation has a left-associative and a
  right-associative version.  If you define both versions and
  perform an operation on two custom objects, the
  left-associative version is chosen.  However, if you perform an
  operation with a basic 'int' and a custom object, the
  custom right-associative method will be chosen over the basic
  operation.  For example:

      >>> class I(int):
      ...     def __xor__(self, other):
      ...         return "XOR"
      ...     def __rxor__(self, other):
      ...         return "RXOR"
      ...
      >>> 0xFF ^ 0xFF
      0
      >>> 0xFF ^ I(0xFF)
      'RXOR'
      >>> I(0xFF) ^ 0xFF
      'XOR'
      >>> I(0xFF) ^ I(0xFF)
      'XOR'

  METHODS:

  int.__and__(self, other)
  int.__rand__(self, other)
      Return a bitwise-and between 'self' and 'other'.
      Determines how a datatype responds to the '&' operator.

  int.__hex__(self)
      Return a hex string representing 'self'.  Determines how a
      datatype responds to the built-in `hex()` function.

  int.__invert__(self)
      Return a bitwise inversion of 'self'.  Determines how a
      datatype responds to the '~' operator.

  int.__lshift__(self, other)
  int.__rlshift__(self, other)
      Return the result of bit-shifting 'self' to the left by
      'other' bits.  The right-associative version shifts 'other'
      by 'self' bits.  Determines how a datatype responds to the
      '<<' operator.

  int.__oct__(self)
      Return an octal string representing 'self'.  Determines how
      a datatype responds to the built-in `oct()` function.

  int.__or__(self, other)
  int.__ror__(self, other)
      Return a bitwise-or between 'self' and 'other'.
      Determines how a datatype responds to the '|' operator.

  int.__rshift__(self, other)
  int.__rrshift__(self, other)
      Return the result of bit-shifting 'self' to the right by
      'other' bits.  The right-associative version shifts 'other'
      by 'self' bits.  Determines how a datatype responds to the
      '>>' operator.

  int.__xor__(self, other)
  int.__rxor__(self, other)
      Return a bitwise-xor between 'self' and 'other'.
      Determines how a datatype responds to the '^' operator.

  SEE ALSO, [float], `int`, `long`, `sys.maxint`, [operator]

  =================================================================
    BUILTIN -- float : New-style base class for floating point numbers
  =================================================================

  Python floating point numbers are mostly implemented using the
  underlying C floating point library of your platform; that is, to
  a greater or lesser degree based on the IEEE 754 standard. A
  complex number is just a Python object that wraps a pair of
  floats with a few extra operations on these pairs.

  DIGRESSION:

  Although the details are far outside the scope of this book, a
  general warning is in order.  Floating point math is harder
  than you think!  If you think you -understand- just how complex
  IEEE 754 math is, you are not yet aware of all of the
  subtleties.  By way of indication, Python luminary and
  erstwhile professor of numeric computing Alex Martelli
  commented in 2001 (on '<comp.lang.python>'):

    Anybody who thinks he knows what he's doing when floating
    point is involved IS either naive, or Tim Peters (well, it
    COULD be W. Kahan I guess, but I don't think he writes here).

  Fellow Python guru Tim Peters observed:

    I find it's possible to be both (wink).  But *nothing* about
    fp comes easily to anyone, and even Kahan works his butt off
    to come up with the amazing things that he does.

  Peters illustrated further by way of Donald Knuth (_The Art of
  Computer Programming_, Third Edition, Addison-Wesley, 1997; ISBN:
  0201896842, vol. 2, p. 229):

    Many serious mathematicians have attempted to analyze a
    sequence of floating point operations rigorously, but found
    the task so formidable that they have tried to be content
    with plausibility arguments instead.

  The trick about floating point numbers is that although they
  are extremely useful for representing real-life (fractional)
  quantities, operations on them do not obey the arithmetic rules
  we learned in middle school:  associativity, transitivity,
  commutativity; moreover, many very ordinary-seeming numbers can
  be represented only approximately with floating point numbers.
  For example:

      >>> 1./3
      0.33333333333333331
      >>> .3
      0.29999999999999999
      >>> 7 == 7./25 * 25
      0
      >>> 7 == 7./24 * 24
      1

  CAPABILITIES:

  In the hierarchy of Python numeric types, floating point
  numbers are higher up the scale than integers, and complex
  numbers higher than floats.  That is, operations on mixed types
  get promoted upwards.  However, the magic methods that make a
  datatype "float-like" are strictly a subset of those associated
  with integers.  All of the magic methods listed below for
  floats apply equally to ints and longs (or integer-like custom
  datatypes).  Complex numbers support a few addition methods.

  Under Python 2.2+, you may create a custom datatype that
  inherits from 'float' or 'complex'; under earlier versions, you
  would need to manually define all the magic methods you wished
  to utilize (generally a lot of work, and probably not worth
  it).

  Each binary operation has a left-associative and a
  right-associative version. If you define both versions and
  perform an operation on two custom objects, the left-associative
  version is chosen. However, if you perform an operation with a
  basic datatype and a custom object, the custom right-associative
  method will be chosen over the basic operation. See the example
  under [int].

  METHODS:

  float.__abs__(self)
      Return the absolute value of 'self'.  Determines how a
      datatype responds to the built-in function `abs()`.

  float.__add__(self, other)
  float.__radd__(self, other)
      Return the sum of 'self' and 'other'. Determines how a
      datatype responds to the '+' operator.

  float.__cmp__(self, other)
      Return a value indicating the order of 'self' and 'other'.
      Determines how a datatype responds to the numeric comparison
      operators '<', '>', '<=', '>=', '==', '<>', and '!='.  Also
      determines the behavior of the built-in `cmp()` function.
      Should return -1 for 'self<other', 0 for 'self==other', and
      1 for 'self>other'.  If other comparison methods are
      defined, they take precedence over '.__cmp__()':
      '.__ge__()', '.__gt__()', '.__le__()', and '.__lt__()'.

  float.__div__(self, other)
  float.__rdiv__(self, other)
      Return the ratio of 'self' and 'other'. Determines how a
      datatype responds to the '/' operator.  In Python 2.3+,
      this method will instead determine how a datatype responds
      to the floor division operator '//'.

  float.__divmod__(self, other)
  float.__rdivmod__(self, other)
      Return the pair '(div, remainder)'.  Determines how a
      datatype responds to the built-in `divmod()` function.

  float.__floordiv__(self, other)
  float.__rfloordiv__(self, other)
      Return the number of whole times 'self' goes into 'other'.
      Determines how a datatype responds to the Python 2.2+
      floor division operator '//'.

  float.__mod__(self, other)
  float.__rmod__(self, other)
      Return the modulo division of 'self' into 'other'.
      Determines how a datatype responds to the '%' operator.

  float.__mul__(self, other)
  float.__rmul__(self, other)
      Return the product of 'self' and 'other'.  Determines how a
      datatype responds to the '*' operator.

  float.__neg__(self)
      Return the negative of 'self'.  Determines how a datatype
      responds to the unary '-' operator.

  float.__pow__(self, other)
  float.__rpow__(self, other)
      Return 'self' raised to the 'other' power.  Determines how
      a datatype responds to the '^' operator.

  float.__sub__(self, other)
  float.__rsub__(self, other)
      Return the difference between 'self' and 'other'.
      Determines how a datatype responds to the binary '-'
      operator.

  float.__truediv__(self, other)
  float.__rtruediv__(self, other)
      Return the ratio of 'self' and 'other'.  Determines how a
      datatype responds to the Python 2.3+ true division operator
      '/'.

  SEE ALSO, [complex], [int], `float`, [operator]

  =================================================================
    BUILTIN -- complex : New-style base class for complex numbers
  =================================================================

  Complex numbers implement all the above documented methods of
  floating point numbers, and a few additional ones.

  Inequality operations on complex numbers are not supported in
  recent versions of Python, even though they were previously. In
  Python 2.1+, the methods `complex.__ge__()`, `complex.__gt__()`
  `complex.__le__()`, and `complex.__lt__()` all raise 'TypeError'
  rather than return Boolean values indicating the order. There is
  a certain logic to this change inasmuch as complex numbers do not
  have a "natural" ordering.  But there is also significant
  breakage with this change--this is one of the few changes in
  Python, since version 1.4 when I started using it, that I feel
  was a real mistake.  The important breakage comes when you
  want to sort a list of various things, some of which might be
  complex numbers:

      >>> lst = ["string", 1.0, 1, 1L, ('t','u','p')]
      >>> lst.sort()
      >>> lst
      [1.0, 1, 1L, 'string', ('t', 'u', 'p')]
      >>> lst.append(1j)
      >>> lst.sort()
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
      TypeError: cannot compare complex numbers using <, <=, >, >=

  It is true that there is no obvious correct ordering between a
  complex number and another number (complex or otherwise), but
  there is also no natural ordering between a string, a tuple, and
  a number. Nonetheless, it is frequently useful to sort a
  heterogeneous list in order to create a canonical (even if
  meaningless) order.  In Python 2.2+, you can remedy this
  shortcoming of recent Python versions in the style below (under
  2.1 you are largely out of luck):

      >>> class C(complex):
      ...   def __lt__(self, o):
      ...     if hasattr(o, 'imag'):
      ...       return (self.real,self.imag) < (o.real,o.imag)
      ...     else:
      ...       return self.real < o
      ...   def __le__(self, o): return self < o or self==o
      ...   def __gt__(self, o): return not (self==o or self < o)
      ...   def __ge__(self, o): return self > o or self==o
      ...
      >>> lst = ["str", 1.0, 1, 1L, (1,2,3), C(1+1j), C(2-2j)]
      >>> lst.sort()
      >>> lst
      [1.0, 1, 1L, (1+1j), (2-2j), 'str', (1, 2, 3)]

  Of course, if you adopt this strategy, you have to create all
  of your complex values using the custom datatype 'C'.  And
  unfortunately, unless you override arithmetic operations also,
  a binary operation between a 'C' object and another number
  reverts to a basic complex datatype.  The reader can work out
  the details of this solution if she needs it.

  METHODS:

  complex.conjugate(self)
      Return the complex conjugate of 'self'.  A quick refresher
      here: If 'self' is 'n+mj' its conjugate is 'n-mj'.

  complex.imag
      Imaginary component of a complex number.

  complex.real
      Real component of a complex number.

  SEE ALSO, [float], `complex`

  =================================================================
    MODULE -- UserDict : Custom wrapper around dictionary objects

  =================================================================
    BUILTIN -- dict : New-style base class for dictionary objects
  =================================================================

  Dictionaries in Python provide a well-optimized mapping between
  immutable objects and other Python objects (see Glossary entry
  on "immutable").  You may create custom datatypes that respond
  to various dictionary operations.  There are a few syntactic
  operations associated with dictionaries, all involving indexing
  with square braces.  But unlike with numeric datatypes, there
  are several regular methods that are reasonable to consider as
  part of the general interface for dictionary-like objects.

  If you create a dictionary-like datatype by subclassing from
  `UserDict.UserDict`, all the special methods defined by the
  parent are proxies to the true dictionary stored in the
  object's '.data' member.  If, under Python 2.2+, you subclass
  from 'dict' itself, the object itself inherits dictionary
  behaviors.  In either case, you may customize whichever methods
  you wish.  Below is an example of the two styles for
  subclassing a dictionary-like datatype:

      >>> from sys import stderr
      >>> from UserDict import UserDict
      >>> class LogDictOld(UserDict):
      ...    def __setitem__(self, key, val):
      ...       stderr.write("Set: "+str(key)+"->"+str(val)+"\n")
      ...       self.data[key] = val
      ...
      >>> ldo = LogDictOld()
      >>> ldo['this'] = 'that'
      Set: this->that
      >>> class LogDictNew(dict):
      ...    def __setitem__(self, key, val):
      ...       stderr.write("Set: "+str(key)+"->"+str(val)+"\n")
      ...       dict.__setitem__(self, key, val)
      ...
      >>> ldn = LogDictOld()
      >>> ldn['this'] = 'that'
      Set: this->that

  METHODS:

  dict.__cmp__(self, other)
  UserDict.UserDict.__cmp__(self, other)
      Return a value indicating the order of 'self' and 'other'.
      Determines how a datatype responds to the numeric comparison
      operators '<', '>', '<=', '>=', '==', '<>', and '!='.  Also
      determines the behavior of the built-in `cmp()` function.
      Should return -1 for 'self<other', 0 for 'self==other', and
      1 for 'self>other'.  If other comparison methods are
      defined, they take precedence over '.__cmp__()':
      '.__ge__()', '.__gt__()', '.__le__()', and '.__lt__()'.

  dict.__contains__(self, x)
  UserDict.UserDict.__contains__(self, x)
      Return a Boolean value indicating whether 'self' "contains"
      the value 'x'.  By default, being contained in a dictionary
      means matching one of its keys, but you can change this
      behavior by overriding it (e.g., check whether 'x' is in a
      value rather than a key).  Determines how a datatype
      responds to the 'in' operator.

  dict.__delitem__(self, x)
  UserDict.UserDict.__delitem__(self, x)
      Remove an item from a dictionary-like datatype.  By
      default, removing an item means removing the pair whose
      key equals 'x'.  Determines how a datatype responds to the
      'del' statement, as in: 'del self[x]'.

  dict.__getitem__(self, x)
  UserDict.UserDict.__getitem__(self, x)
      By default, return the value associated with the key 'x'.
      Determines how a datatype responds to indexing with square
      braces.  You may override this method to either search
      differently or return special values.  For example:

      >>> class BagOfPairs(dict):
      ...     def __getitem__(self, x):
      ...         if self.has_key(x):
      ...             return (x, dict.__getitem__(self,x))
      ...         else:
      ...             tmp = dict([(v,k) for k,v in self.items()])
      ...             return (dict.__getitem__(tmp,x), x)
      ...
      >>> bop = BagOfPairs({'this':'that', 'spam':'eggs'})
      >>> bop['this']
      ('this', 'that')
      >>> bop['eggs']
      ('spam', 'eggs')
      >>> bop['bacon'] = 'sausage'
      >>> bop
      {'this': 'that', 'bacon': 'sausage', 'spam': 'eggs'}
      >>> bop['nowhere']
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        File "<stdin>", line 7, in __getitem__
      KeyError: nowhere

  dict.__len__(self)
  UserDict.UserDict.__len__(self)
      Return the length of the dictionary.  By default this is
      simply a count of the key/val pairs, but you could perform
      a different calculation if you wished (e.g, perhaps you
      would cache the size of a record set returned from a
      database query that emulated a dictionary).  Determines how
      a datatype responds to the built-in `len()` function.

  dict.__setitem__(self, key, val)
  UserDict.UserDict.__setitem__(self, key, val)
      Set the dictionary key 'key' to value 'val'.  Determines
      how a datatype responds to indexed assignment; that is,
      'self[key]=val'.   A custom version might actually perform
      some calculation based on 'val' and/or 'key' before adding
      an item.

  dict.clear(self)
  UserDict.UserDict.clear(self)
      Remove all items from 'self'.

  dict.copy(self)
  UserDict.UserDict.copy(self)
      Return a copy of the dictionary 'self' (i.e., a distinct
      object with the same items).

  dict.get(self, key [,default=None])
  UserDict.UserDict.get(self, key [,default=None])
      Return the value associated with the key 'key'.  If no item
      with the key exists, return 'default' instead of raising a
      'KeyError'.

  dict.has_key(self, key)
  UserDict.UserDict.has_key(self, key)
      Return a Boolean value indicating whether 'self' has the
      key 'key'.

  dict.items(self)
  UserDict.UserDict.items(self)
  dict.iteritems(self)
  UserDict.UserDict.iteritems(self)
      Return the items in a dictionary, in an unspecified order.
      The '.items()' method returns a true list of '(key,val)'
      pairs, while the '.iteritems()' method (in Python 2.2+)
      returns a generator object that successively yields items.
      The latter method is useful if your dictionary is not a
      true in-memory structure, but rather some sort of
      incremental query or calculation.  Either method responds
      externally similarly to a 'for' loop:

      >>> d = {1:2, 3:4}
      >>> for k,v in d.iteritems(): print k,v,':',
      ...
      1 2 : 3 4 :
      >>> for k,v in d.items(): print k,v,':',
      ...
      1 2 : 3 4 :

  dict.keys(self)
  UserDict.UserDict.keys(self)
  dict.iterkeys(self)
  UserDict.UserDict.iterkeys(self)
      Return the keys in a dictionary, in an unspecified order.
      The '.keys()' method returns a true list of keys, while the
      '.iterkeys()' method (in Python 2.2+) returns a generator
      object.

      SEE ALSO, `dict.items()`

  dict.popitem(self)
  UserDict.UserDict.popitem(self)
      Return a '(key,val)' pair for the dictionary, or raise as
      'KeyError' if the dictionary is empty.  Removes the
      returned item from the dictionary.  As with other
      dictionary methods, the order in which items are popped is
      unspecified (and can vary between versions and platforms).

  dict.setdefault(self, key [,default=None])
  UserDict.UserDict.setdefault(self, key [,default=None])
      If 'key' is currently in the dictionary, return the
      corresponding value.  If 'key' is not currently in the
      dictionary, set 'self[key]=default', then return 'default'.

      SEE ALSO, `dict.get()`

  dict.update(self, other)
  UserDict.UserDict.update(self, other)
      Update the dictionary 'self' using the dictionary 'other'.
      If a key in 'other' already exists in 'self', the
      corresponding value from 'other' is used in 'self'.  If a
      '(key,val)' pair in 'other' is not in 'self', it is added.

  dict.values(self)
  UserDict.UserDict.values(self)
  dict.itervalues(self)
  UserDict.UserDict.itervalues(self)
      Return the values in a dictionary, in an unspecified order.
      The '.values()' method returns a true list of keys, while
      the '.itervalues()' method (in Python 2.2+) returns a
      generator object.

      SEE ALSO, `dict.items()`

  SEE ALSO, `dict`, [list], [operator]

  =================================================================
    MODULE -- UserList : Custom wrapper around list objects

  =================================================================
    BUILTIN -- list : New-style base class for list objects

  =================================================================
    BUILTIN -- tuple : New-style base class for tuple objects
  =================================================================

  A Python list is a (possibly) heterogeneous mutable sequence of
  Python objects.  A tuple is a similar immutable sequence (see
  Glossary entry on "immutable").  Most of the magic methods of
  lists and tuples are the same, but a tuple does not have those
  methods associated with internal transformation.

  If you create a list-like datatype by subclassing from
  `UserList.UserList`, all the special methods defined by the
  parent are proxies to the true list stored in the object's
  '.data' member. If, under Python 2.2+, you subclass from 'list'
  (or 'tuple') itself, the object itself inherits list (tuple)
  behaviors. In either case, you may customize whichever methods
  you wish.   The discussion of [dict] and [UserDict] show an
  example of the different styles of specialization.

  The difference between a list-like object and a tuple-like
  object runs less deep than you might think. Mutability is only
  really important for using objects as dictionary keys, but
  dictionaries only check the mutability of an object by examining
  the return value of an object's '.__hash__()' method. If this
  method fails to return an integer, an object is considered
  mutable (and ineligible to serve as a dictionary key).  The
  reason that tuples are useful as keys is because every tuple
  composed of the same items has the same hash; two lists (or
  dictionaries), by contrast, may also have the same items, but
  only as a passing matter (since either can be changed).

  You can easily give a hash value to a list-like datatype.
  However, there is an obvious and wrong way to do so:

      >>> class L(list):
      ...     __hash__ = lambda self: hash(tuple(self))
      ...
      >>> lst = L([1,2,3])
      >>> dct = {lst:33, 7:8}
      >>> print dct
      {[1, 2, 3]: 33, 7: 8}
      >>> dct[lst]
      33
      >>> lst.append(4)
      >>> print dct
      {[1, 2, 3, 4]: 33, 7: 8}
      >>> dct[lst]
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
      KeyError: [1, 2, 3, 4]

  As soon as 'lst' changes, its hash changes, and you cannot
  reach the dictionary item keyed to it.  What you need is
  something that does not change as the object changes:

      >>> class L(list):
      ...     __hash__ = lambda self: id(self)
      ...
      >>> lst = L([1,2,3])
      >>> dct = {lst:33, 7:8}
      >>> dct[lst]
      33
      >>> lst.append(4)
      >>> dct
      {[1, 2, 3, 4]: 33, 7: 8}
      >>> dct[lst]
      33

  As with most everything about Python datatypes and operations,
  mutability is merely a protocol that you can choose to support
  or not support in your custom datatypes.

  Sequence datatypes may choose to support order comparisons--in
  fact they probably should. The methods '.__cmp__()', '.__ge__()',
  '.__gt__()', '.__le__()', and '.__lt__()' have the same meanings
  for sequences that they do for other datatypes; see [operator],
  [float], and [dict] for details.

  METHODS:

  list.__add__(self, other)
  UserList.UserList.__add__(self, other)
  tuple.__add__(self, other)
  list.__iadd__(self, other)
  UserList.UserList.__iadd__(self, other)
      Determine how a datatype responds to the '+' and '+='
      operators.  Augmented assignments ("in-place add") are
      supported in Python 2.0+.  For list-like datatypes,
      normally the statements 'lst+=other' and 'lst=lst+other'
      have the same effect, but the augmented version might be
      more efficient.

      Under standard meaning, addition of the two sequence
      objects produces a new (distinct) sequence object with all
      the items in both 'self' and 'other'.  An in-place add
      ('.__iadd__') mutates the left-hand object without creating
      a new object.  A custom datatype might choose to give a
      special meaning to addition, perhaps depending on the
      datatype of the object added in.  For example:

      >>> class XList(list):
      ...     def __iadd__(self, other):
      ...         if issubclass(other.__class__, list):
      ...             return list.__iadd__(self, other)
      ...         else:
      ...             from operator import add
      ...             return map(add, self, [other]*len(self))
      ...
      >>> xl = XList([1,2,3])
      >>> xl += [4,5,6]
      >>> xl
      [1, 2, 3, 4, 5, 6]
      >>> xl += 10
      >>> xl
      [11, 12, 13, 14, 15, 16]

  list.__contains__(self, x)
  UserList.UserList.__contains__(self, x)
  tuple.__contains__(self, x)
      Return a Boolean value indicating whether 'self' contains
      the value 'x'.  Determines how a datatype responds to the
      'in' operator.

  list.__delitem__(self, x)
  UserList.UserList.__delitem__(self, x)
      Remove an item from a list-like datatype. Determines how a
      datatype responds to the 'del' statement, as in
      'del self[x]'.

  list.__delslice__(self, start, end)
  UserList.UserList.__delslice__(self, start, end)
      Remove a range of items from a list-like datatype.
      Determines how a datatype responds to the 'del' statement
      applied to a slice, as in 'del self[start:end]'.

  list.__getitem__(self, pos)
  UserList.UserList.__getitem__(self, pos)
  tuple.__getitem__(self, pos)
      Return the value at offset 'pos' in the list. Determines
      how a datatype responds to indexing with square braces.
      The default behavior on list indices is to raise an
      'IndexError' for nonexistent offsets.

  list.__getslice__(self, start, end)
  UserList.UserList.__getslice__(self, start, end)
  tuple.__getslice__(self, start, end)
      Return a subsequence of the sequence 'self'.  Determines
      how a datatype responds to indexing with a slice parameter,
      as in 'self[start:end]'.

  list.__hash__(self)
  UserList.UserList.__hash__(self)
  tuple.__hash__(self)
      Return an integer that distinctly identifies an object.
      Determines how a datatype responds to the built-in `hash()`
      function--and probably more importantly the hash is used
      internally in dictionaries. By default, tuples (and other
      immutable types) will return hash values but lists will
      raise a 'TypeError'.  Dictionaries will handle hash
      collisions gracefully, but it is best to try to make hashes
      unique per object.

      >>> hash(219750523), hash((1,2))
      (219750523, 219750523)
      >>> dct = {219750523:1, (1,2):2}
      >>> dct[219750523]
      1

  list.__len__(self
  UserList.UserList.__len__(self
  tuple.__len__(self
      Return the length of a sequence. Determines how a datatype
      responds to the built-in `len()` function.

  list.__mul__(self, num)
  UserList.UserList.__mul__(self, num)
  tuple.__mul__(self, num)
  list.__rmul__(self, num)
  UserList.UserList.__rmul__(self, num)
  tuple.__rmul__(self, num)
  list.__imul__(self, num)
  UserList.UserList.__imul__(self, num)
      Determine how a datatype responds to the '*' and '*='
      operators.  Augmented assignments ("in-place add") are
      supported in Python 2.0+.  For list-like datatypes,
      normally the statements 'lst*=other' and 'lst=lst*other'
      have the same effect, but the augmented version might be
      more efficient.

      The right-associative version '.__rmul__()' determines the
      value of 'num*self', the left-associative '.__mul__()'
      determines the value of 'self*num'. Under standard meaning,
      the product of a sequence and a number produces a new
      (distinct) sequence object with the items in 'self'
      duplicated 'num' times:

      >>> [1,2,3] * 3
      [1, 2, 3, 1, 2, 3, 1, 2, 3]

  list.__setitem__(self, pos, val)
  UserList.UserList.__setitem__(self, pos, val)
      Set the value at offset 'pos' to value 'value'. Determines
      how a datatype responds to indexed assignment; that is,
      'self[pos]=val'.   A custom version might actually perform
      some calculation based on 'val' and/or 'key' before adding
      an item.

  list.__setslice__(self, start, end, other)
  UserList.UserList.__setslice__(self, start, end, other)
      Replace the subsequence 'self[start:end]' with the sequence
      'other'.  The replaced and new sequences are not
      necessarily the same length, and the resulting sequence
      might be longer or shorter than 'self'. Determines how a
      datatype responds to assignment to a slice, as in
      'self[start:end]=other'.

  list.append(self, item)
  UserList.UserList.append(self, item)
      Add the object 'item' to the end of the sequence 'self'.
      Increases the length of 'self' by one.

  list.count(self, item)
  UserList.UserList.count(self, item)
      Return the integer number of occurrences of 'item' in
      'self'.

  list.extend(self, seq)
  UserList.UserList.extend(self, seq)
      Add each item in 'seq' to the end of the sequence 'self'.
      Increases the length of 'self' by 'len(seq)'.

  list.index(self, item)
  UserList.UserList.index(self, item)
      Return the offset index of the first occurrence of 'item'
      in 'self'.

  list.insert(self, pos, item)
  UserList.UserList.insert(self, pos, item)
      Add the object 'item' to the sequence 'self' before the
      offset 'pos'.  Increases the length of 'self' by one.

  list.pop(self [,pos=-1])
  UserList.UserList.pop(self [,pos=-1])
      Return the item at offset 'pos' of the sequence 'self',
      and remove the returned item from the sequence.  By
      default, remove the last item, which lets a list act like a
      stack using the '.pop()' and '.append()' operations.

  list.remove(self, item)
  UserList.UserList.remove(self, item)
      Remove the first occurrence of 'item' in 'self'.  Decreases
      the length of 'self' by one.

  list.reverse(self)
  UserList.UserList.reverse(self)
      Reverse the list 'self' in place.

  list.sort(self [cmpfunc])
  UserList.UserList.sort(self [,cmpfunc])
      Sort the list 'self' in place.  If a comparison function
      'cmpfunc' is given, perform comparisons using that function.

  SEE ALSO, `list`, `tuple`, [dict], [operator]

  =================================================================
    MODULE -- UserString : Custom wrapper around string objects

  =================================================================
    BUILTIN -- str : New-style base class for string objects
  =================================================================

  A string in Python is an immutable sequence of characters (see
  Glossary entry on "immutable").  There is special syntax for
  creating strings--single and triple quoting, character
  escaping, and so on--but in terms of object behaviors and magic
  methods, most of what a string does a tuple does, too.  Both may
  be sliced and indexed, and both respond to pseudo-arithmetic
  operators '+' and '*'.

  For the [str] and [UserString] magic methods that are strictly a
  matter of the sequence quality of strings, see the corresponding
  [tuple] documentation. These include `str.__add__()`,
  `str.__getitem__()`, `str.__getslice__()`, `str.__hash__()`,
  `str.__len__()`, `str.__mul__()`, and `str.__rmul__()`. Each of
  these methods is also defined in [UserString]. The [UserString]
  module also includes a few explicit definitions of magic methods
  that are not in the new-style [str] class:
  `UserString.__iadd__()`, `UserString.__imul__()`, and
  `UserString.__radd__()`. However, you may define your own
  implementations of these methods, even if you inherit from [str]
  (in Python 2.2+). In any case, internally, in-place operations
  are still performed on all strings.

  Strings have quite a number of nonmagic methods as well.  If
  you wish to create a custom datatype that can be utilized in
  the same functions that expect strings, you may want to
  specialize some of these common string methods.  The behavior
  of string methods is documented in the discussion of the
  [string] module, even for the few string methods that are not
  also defined in the [string] module.  However, inheriting from
  either [str] or [UserString] provides very reasonable default
  behaviors for all these methods.

  SEE ALSO, `"".capitalize()`, `"".title()`, `"".center()`,
  `"".count()`, `"".endswith()`, `"".expandtabs()`, `"".find()`,
  `"".index()`, `"".isalpha()`, `"".isalnum()`, `"".isdigit()`,
  `"".islower()`, `"".isspace()`, `"".istitle()`, `"".isupper()`,
  `"".join()`, `"".ljust()`, `"".lower()`, `"".lstrip()`,
  `"".replace()`, `"".rfind()`, `"".rindex()`, `"".rjust()`,
  `"".rstrip()`, `"".split()`, `"".splitlines()`,
  `"".startswith()`, `"".strip()`, `"".swapcase()`,
  `"".translate()`, `"".upper()`, `"".encode()`

  METHODS:

  str.__contains__(self, x)
  UserString.UserString.__contains__(self, x)
      Return a Boolean value indicating whether 'self' contains
      the character 'x'.  Determines how a datatype responds to
      the 'in' operator.

      In Python versions through 2.2, the 'in' operator applied to
      strings has a semantics that tends to trip me up.  Fortunately,
      Python 2.3+ has the behavior that I expect.  In older Python
      versions, string, 'in' can only be used to determine the
      presence of a single character in a string--this makes sense
      if you think of a string as a sequence of characters, but I
      nonetheless intuitively want something like the code below
      to work:

      >>> s = "The cat in the hat"
      >>> if "the" in s: print "Has definite article"
      ...
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
      TypeError: 'in <string>' requires character as left operand

      It is easy to get the "expected" behavior is a custom
      string-like datatype (while still always producing the same
      result whenever 'x' is indeed a character:

      >>> class S(str):
      ...     def __contains__(self, x):
      ...         for i in range(len(self)):
      ...             if self.startswith(x,i): return 1
      ...
      >>> s = S("The cat in the hat")
      >>> "the" in s
      1
      >>> "an" in s
      0

      Python 2.3 strings behave the same way as my datatype 'S'.

  SEE ALSO, `string`, [string], [operator], [tuple]


  EXERCISE: Filling out the forms (or deciding not to)
  --------------------------------------------------------------------

  DISCUSSION:

  A particular little task that was quite frequent and general
  before the advent of Web servers, has become absolutely
  ubiquitous for slightly dynamic Web pages. The pattern one
  encounters is that one has a certain general format that is
  desired for a document or file, but miscellaneous little
  details differ from instance to instance.  Form letters are
  another common case where one comes across this pattern, but
  thematically related collections of Web pages rule the roost of
  templating techniques.

  It turns out that everyone and her sister has developed her own
  little templating system.  Creating a templating system is a
  very appealing task for users of most scripting languages, just
  a little while after they have gotten a firm grasp of "Hello
  World!" Some of these are discussed in Chapter 5, but many others
  are not addressed.  Often, these templating systems will be
  HTML/CGI oriented and will often include some degree of
  dynamic calculation of fill-in values--the inspiration in these
  cases comes from systems like Allaire's ColdFusion, Java Server
  Pages, Active Server Pages, and PHP, in which some program code
  gets sprinkled around in documents that are primarily made of
  HTML.

  At the very simplest, Python provides interpolation of special
  characters in strings, in a style similar to the C 'sprintf()'
  function.  So a simple example might appear like:

      >>> form_letter="""Dear %s %s,
      ...
      ... You owe us $%s for account (#%s). Please Pay.
      ...
      ... The Company"""
      >>> fname = 'David'
      >>> lname = 'Mertz'
      >>> due = 500
      >>> acct = '123-T745'
      >>> print form_letter % (fname,lname,due,acct)
      Dear David Mertz,

      You owe us $500 for account (#123-T745). Please Pay.

      The Company

  This approach does the basic templating, but it would be easy
  to make an error in composing the tuple of insertion values.
  And moreover, a slight change to the 'form_letter'
  template--such as the addition or subtraction of a field--would
  produce wrong results.

  A bit more robust approach is to use Python's dictionary-based
  string interpolation.  For example:

      >>> form_letter="""Dear %(fname)s %(lname)s,
      ...
      ... You owe us $%(due)s for account (#%(acct)s). Please Pay.
      ...
      ... The Company"""
      >>> fields = {'lname':'Mertz', 'fname':'David'}
      >>> fields['acct'] = '123-T745'
      >>> fields['due'] = 500
      >>> fields['last_letter'] = '01/02/2001'
      >>> print form_letter % fields
      Dear David Mertz,

      You owe us $500 for account (#123-T745). Please Pay.

      The Company

  With this approach, the fields need not be listed in a
  particular order for the insertion.  Furthermore, if the order
  of fields is rearranged in the template, or if the same fields
  are used for a different template, the 'fields' dictionary may
  still be used for insertion values.  If 'fields' has unused
  dictionary keys, it doesn't hurt the interpolation, either.

  The dictionary interpolation approach is still subject to failure
  if dictionary keys are missing. Two improvements using the
  [UserDict] module can improve matters, in two different (and
  incompatible) ways. In Python 2.2+ the built-in 'dict' type can
  be a parent for a "new style class;" if available everywhere you
  need it to run, 'dict' is a better parent than is
  `UserDict.UserDict`. One approach is to avoid all key misses
  during dictionary interpolation:

      >>> form_letter="""%(salutation)s %(fname)s %(lname)s,
      ...
      ... You owe us $%(due)s for account (#%(acct)s). Please Pay.
      ...
      ... %(closing)s The Company"""
      >>> from UserDict import UserDict
      >>> class AutoFillingDict(UserDict):
      ...     def __init__(self,dict={}): UserDict.__init__(self,dict)
      ...     def __getitem__(self,key):
      ...         return UserDict.get(self, key, '')
      >>> fields = AutoFillingDict()
      >>> fields['salutation'] = 'Dear'
      >>> fields
      {'salutation': 'Dear'}
      >>> fields['fname'] = 'David'
      >>> fields['due'] = 500
      >>> fields['closing'] = 'Sincerely,'
      >>> print form_letter % fields
      Dear David ,

      You owe us $500 for account (#). Please Pay.

      Sincerely, The Company

  Even though the fields 'lname' and 'acct' are not specified,
  the interpolation has managed to produce a basically sensible
  letter (instead of crashing with a KeyError).

  Another approach is to create a custom dictionary-like object
  that will allow for "partial interpolation."  This approach is
  particularly useful to gather bits of the information needed
  for the final string over the course of the program run (rather
  than all at once):

      >>> form_letter="""%(salutation)s %(fname)s %(lname)s,
      ...
      ... You owe us $%(due)s for account (#%(acct)s). Please Pay.
      ...
      ... %(closing)s The Company"""
      >>> from UserDict import UserDict
      >>> class ClosureDict(UserDict):
      ...     def __init__(self,dict={}): UserDict.__init__(self,dict)
      ...     def __getitem__(self,key):
      ...         return UserDict.get(self, key, '%('+key+')s')
      >>> name_dict = ClosureDict({'fname':'David','lname':'Mertz'})
      >>> print form_letter % name_dict
      %(salutation)s David Mertz,

      You owe us $%(due)s for account (#%(acct)s). Please Pay.

      %(closing)s The Company

  Interpolating using a 'ClosureDict' simply fills in whatever
  portion of the information it knows, then returns a new string
  that is closer to being filled in.

  SEE ALSO, [dict], [UserDict], [UserList], [UserString]

  QUESTIONS:

  1.  What are some other ways to provide "smart" string
      interpolation? Can you think of ways that the [UserList] or
      [UserString] modules might be used to implement a similar
      enhanced interpolation?

  2.  Consider other "magic" methods that you might add to
      classes inheriting from `UserDict.UserDict`.  How might
      these additional behaviors make templating techniques more
      powerful?

  3.  How far do you think you can go in using Python's string
      interpolation as a templating technique? At what point
      would you decide you had to apply other techniques, such as
      regular expression substitutions or a parser? Why?

  4.  What sorts of error checking might you implement for
      customized interpolation? The simple list or dictionary
      interpolation could fail fairly easily, but at least those
      were trappable errors (they let the application know
      something is amiss).  How would you create a system with
      both flexible interpolation and good guards on the quality
      and completeness of the final result?


  PROBLEM: Working with lines from a large file
  --------------------------------------------------------------------

  At its simplest, reading a file in a line-oriented style is just
  a matter of using the '.readline()', '.readlines()', and
  '.xreadlines()' methods of a file object. Python 2.2+ provides a
  simplified syntax for this frequent operation by letting the file
  object itself efficiently iterate over lines (strictly in forward
  sequence). To read in an entire file, you may use the '.read()'
  method and possibly split it into lines or other chunks using
  the `string.split()` function. Some examples:

      >>> for line in open('chap1.txt'): # Python 2.2+
      ...     # process each line in some manner
      ...     pass
      ...
      >>> linelist = open('chap1.txt').readlines()
      >>> print linelist[1849],
        EXERCISE: Working with lines from a large file
      >>> txt = open('chap1.txt').read()
      >>> from os import linesep
      >>> linelist2 = txt.split(linesep)

  For moderately sized files, reading the entire contents is not
  a big issue. But large files make time and memory issues more
  important. Complex documents or active log files, for example,
  might be multiple megabytes, or even gigabytes, in size--even
  if the contents of such files do not strictly exceed the size of
  available memory, reading them can still be time consuming. A
  related technique to those discussed here is discussed in
  Chapter 2: "Reading a file backwards by record, line, or
  paragraph."

  Obviously, if you -need- to process every line in a file, you
  have to read the whole file; [xreadlines] does so in a
  memory-friendly way, assuming you are able to process them
  sequentially.  But for applications that only need a subset of
  lines in a large file, it is not hard to make improvements.
  The most important module to look to for support here is
  [linecache].

  A CACHED LINE LIST:

  It is straightforward to read a particular line from a file
  using [linecache]:

      >>> import linecache
      >>> print linecache.getline('chap1.txt',1850),
        PROBLEM: Working with lines from a large file

  Notice that `linecache.getline()` uses one-based counting, in
  contrast to the zero-based list indexing in the prior example.
  While there is not much to this, it would be even nicer to have
  an object that combined the efficiency of [linecache] with the
  interfaces we expect in lists. Existing code might exist to
  process lists of lines, or you might want to write a function
  that is agnostic about the source of a list of lines. In
  addition to being able to enumerate and index, it would be
  useful to be able to slice [linecache]-based objects, just as
  we might do to real lists (including with extended slices,
  which were added to lists in Python 2.3).

      #------------------ cachedlinelist.py --------------------#
      import linecache, types
      class CachedLineList:
          # Note: in Python 2.2+, it is probably worth including:
          # __slots__ = ('_fname')
          # ...and inheriting from 'object'
          def __init__(self, fname):
              self._fname = fname
          def __getitem__(self, x):
              if type(x) is types.SliceType:
                  return [linecache.getline(self._fname, n+1)
                          for n in range(x.start, x.stop, x.step)]
              else:
                  return linecache.getline(self._fname, x+1)
          def __getslice__(self, beg, end):
              # pass to __getitem__ which does extended slices also
              return self[beg:end:1]

  Using these new objects is almost identical to using a list
  created by 'open(fname).readlines()', but more efficient
  (especially in memory usage):

      >>> from cachedlinelist import CachedLineList
      >>> cll = CachedLineList('../chap1.txt')
      >>> cll[1849]
      '  PROBLEM: Working with lines from a large file\r\n'
      >>> for line in cll[1849:1851]: print line,
      ...
        PROBLEM: Working with lines from a large file
        ----------------------------------------------------------
      >>> for line in cll[1853:1857:2]: print line,
      ...
        a matter of using the '.readline()', '.readlines()' and
        simplified syntax for this frequent operation by letting the

  A RANDOM LINE:

  Occasionally--especially for testing purposes--you might want
  to check "typical" lines in a line-oriented file.  It is easy
  to fall into the trap of making sure that a process works for
  the first few lines of a file, and maybe for the last few, then
  assuming it works everywhere.  Unfortunately, the first and
  last few lines of many files tend to be atypical: sometimes
  headers or footers are used; sometimes a log file's first lines
  were logged during development rather than usage; and so on.
  Then again, exhaustive testing of entire files might provide
  more data than you want to worry about.  Depending on the
  nature of the processing, complete testing could be time
  consuming as well.

  On most systems, seeking to a particular position in a file is
  far quicker than reading all the bytes up to that position.
  Even using [linecache], you need to read a file byte-by-byte up
  to the point of a cached line.  A fast approach to finding
  random lines from a large file is to seek to a random position
  within a file, then read comparatively few bytes before and
  after that position, identifying a line within that chunk.

      #-------------------- randline.py ------------------------#
      #!/usr/bin/python
      """Iterate over random lines in a file (req Python 2.2+)
      From command-line use: % randline.py <fname> <numlines>
      """
      import sys
      from os import stat, linesep
      from stat import ST_SIZE
      from random import randrange
      MAX_LINE_LEN = 4096

      #-- Iterable class
      class randline(object):
          __slots__ = ('_fp','_size','_limit')
          def __init__(self, fname, limit=sys.maxint):
              self._size = stat(fname)[ST_SIZE]
              self._fp = open(fname,'rb')
              self._limit = limit
          def __iter__(self):
              return self
          def next(self):
              if self._limit <= 0:
                  raise StopIteration
              self._limit -= 1
              pos = randrange(self._size)
              priorlen = min(pos, MAX_LINE_LEN)   # maybe near start
              self._fp.seek(pos-priorlen)
              # Add extra linesep at beg/end in case pos at beg/end
              prior = linesep + self._fp.read(priorlen)
              post = self._fp.read(MAX_LINE_LEN) + linesep
              begln = prior.rfind(linesep) + len(linesep)
              endln = post.find(linesep)
              return prior[begln:]+post[:endln]

      #-- Use as command-line tool
      if __name__=='__main__':
          fname, numlines = sys.argv[1], int(sys.argv[2])
          for line in randline(fname, numlines):
              print line

  The presented [randline] module may be used either imported
  into another application or as a command-line tool.  In the
  latter case, you could pipe a collection of random lines to
  another application, as in:

      #*---------- Piping random lines to application ----------#
      % randline.py reallybig.log 1000 | testapp

  A couple details should be noted in my implementation. (1) The
  same line can be chosen more than once in a line iteration. If
  you choose a small number of lines from a large file, this
  probably will not happen (but the so-called "birthday paradox"
  makes an occasional collision more likely than you might expect;
  see the Glossary). (2) What is selected is "the line that
  contains a random position in the file"; which means that short
  lines are less likely to be chosen than long lines. That
  distribution could be a bug or feature, depending on your needs.
  In practical terms, for testing "enough" typical cases, the
  precise distribution is not all that important.

  SEE ALSO, [xreadlines], [linecache], [random]


SECTION 2 -- Standard Modules
------------------------------------------------------------------------

  There are a variety of tasks that many or most text processing
  applications will perform, but that are not themselves text
  processing tasks. For example, texts typically live inside files,
  so for a concrete application you might want to check whether
  files exist, whether you have access to them, and whether they
  have certain attributes; you might also want to read their
  contents. The text processing per se does not happen until the
  text makes it into a Python value, but getting the text into
  local memory is a necessary step.

  Another task is making Python objects persistent so that final or
  intermediate processing results can be saved in computer-usable
  forms. Or again, Python applications often benefit from being
  able to call external processes and possibly work with the
  results of those calls.

  Yet another class of modules helps you deal with Python internals
  in ways that go beyond what the inherent syntax does. I have made
  a judgment call in this book as to which such "Python
  internal" modules are sufficiently general and frequently used
  in text processing applications; a number of "internal" modules
  are given only one-line descriptions under the "Other Modules"
  topic.

  TOPIC -- Working with the Python Interpreter
  --------------------------------------------------------------------

  Some of the modules in the standard library contain functionality
  that is nearly as important to Python as the basic syntax. Such
  modularity is an important strength of Python's design, but users
  of other languages may be surprised to find capabilities for
  reading command-line arguments, catching exceptions, copying
  objects, or the like in external modules.

  =================================================================
    MODULE -- copy : Generic copying operations
  =================================================================

  Names in Python programs are merely bindings to underlying
  objects; many of these objects are mutable. This point is simple,
  but it winds up biting almost every beginning Python
  programmer--and even a few experienced Pythoners get caught, too.
  The problem is that binding another name (including a sequence
  position, dictionary entry, or attribute) to an object leaves you
  with two names bound to the same object. If you change the
  underlying object using one name, the other name also points to a
  changed object. Sometimes you want that, sometimes you do not.

  One variant of the binding trap is a particularly frequent
  pitfall. Say you want a 2D table of values, initialized as
  zeros.  Later on, you would like to be able to refer to a
  row/column position as, for example, 'table[2][3]' (as in many
  programming languages).  Here is what you would probably try
  first, along with its failure:

      >>> row = [0]*4
      >>> print row
      [0, 0, 0, 0]
      >>> table = [row]*4   # or 'table = [[0]*4]*4
      >>> for row in table: print row
      ...
      [0, 0, 0, 0]
      [0, 0, 0, 0]
      [0, 0, 0, 0]
      [0, 0, 0, 0]
      >>> table[2][3] = 7
      >>> for row in table: print row
      ...
      [0, 0, 0, 7]
      [0, 0, 0, 7]
      [0, 0, 0, 7]
      [0, 0, 0, 7]
      >>> id(table[2]), id(table[3])
      (6207968, 6207968)

  The problem with the example is that 'table' is a list of
  four positional bindings to the -exact same- list object.
  You cannot change just one row, since all four point to just
  one object.  What you need instead is a -copy- of 'row' to put
  in each row of 'table'.

  Python provides a number of ways to create copies of objects (and
  bind them to names). Such a copy is a "snapshot" of the state of
  the object that can be modified independently of changes to the
  original. A few ways to correct the table problem are:

      >>> table1 = map(list, [(0,)*4]*4)
      >>> id(table1[2]), id(table1[3])
      (6361712, 6361808)
      >>> table2 = [lst[:] for lst in [[0]*4]*4]
      >>> id(table2[2]), id(table2[3])
      (6356720, 6356800)
      >>> from copy import copy
      >>> row = [0]*4
      >>> table3 = map(copy, [row]*4)
      >>> id(table3[2]), id(table3[3])
      (6498640, 6498720)

  In general, slices always create new lists. In Python 2.2+, the
  constructors 'list()' and 'dict()' likewise construct new/copied
  lists/dicts (possibly using other sequence or association types
  as arguments).

  But the most general way to make a new copy of -whatever
  object you might need- is with the [copy] module.  If you use
  the [copy] module you do not need to worry about issues of
  whether a given sequence is a list, or merely list-like, which
  the 'list()' coercion forces into a list.

  FUNCTIONS:

  copy.copy(obj)
      Return a shallow copy of a Python object.  Most (but not
      quite all) types of Python objects can be copied.  A
      shallow copy binds its elements/members to the same objects
      as bound in the original--but the object itself is
      distinct.

      >>> import copy
      >>> class C: pass
      ...
      >>> o1 = C()
      >>> o1.lst = [1,2,3]
      >>> o1.str = "spam"
      >>> o2 = copy.copy(o1)
      >>> o1.lst.append(17)
      >>> o2.lst
      [1, 2, 3, 17]
      >>> o1.str = 'eggs'
      >>> o2.str
      'spam'

  copy.deepcopy(obj)
      Return a deep copy of a Python object.  Each element or
      member in an object is itself recursively copied.  For
      nested containers, it is usually more desirable to perform
      a deep copy--otherwise you can run into problems like the
      2D table example above.

      >>> o1 = C()
      >>> o1.lst = [1,2,3]
      >>> o3 = copy.deepcopy(o1)
      >>> o1.lst.append(17)
      >>> o3.lst
      [1, 2, 3]
      >>> o1.lst
      [1, 2, 3, 17]

  =================================================================
    MODULE -- exceptions : Standard exception class hierarchy
  =================================================================

  Various actions in Python raise exceptions, and these
  exceptions can be caught using an 'except' clause.  Although
  strings can serve as exceptions for backwards-compatibility
  reasons, it is greatly preferable to use class-based
  exceptions.

  When you catch an exception in using an 'except' clause, you
  also catch any descendent exceptions.  By utilizing a hierarchy
  of standard and user-defined exception classes, you can tailor
  exception handling to meet your specific code requirements.

      >>> class MyException(StandardError): pass
      ...
      >>> try:
      ...     raise MyException
      ... except StandardError:
      ...     print "Caught parent"
      ... except MyException:
      ...     print "Caught specific class"
      ... except:
      ...     print "Caught generic leftover"
      ...
      Caught parent


  In general, if you need to raise exceptions manually, you should
  either use a built-in exception close to your situation, or
  inherit from that built-in exception.  The outline in Figure 1.1
  shows the exception classes defined in [exceptions].

  !!!

      #----- Standard exceptions -----#
      <<exception_hierarchy.eps>>


  =================================================================
    MODULE -- getopt : Parser for command line options
  =================================================================

  Utility applications--whether for text processing or
  otherwise--frequently accept a variety of command-line switches
  to configure their behavior. In principle, and frequently in
  practice, all that you need to do to process command-line options
  is read through the list 'sys.argv[1:]' and handle each element
  of the option line. I have certainly written my own small
  "sys.argv parser" more than once; it is not hard if you do not
  expect too much.

  The [getopt] module provides some automation and error handling
  for option parsing.  It takes just a few lines of code to tell
  [getopt] what options it should handle, and which switch
  prefixes and parameter styles to use.  However, [getopt] is not
  necessarily the final word in parsing command lines.  Python
  2.3 includes Greg Ward's [optik] module
  <http://optik.sourceforge.net/> renamed as [optparse], and the
  Twisted Matrix library contains [twisted.python.usage]
  <http://twistedmatrix.com/documents/howto/options>.  These
  modules, and other third-party tools, were written because of
  perceived limitations in [getopt].

  For most purposes, [getopt] is a perfectly good tool. Moreover,
  even if some enhanced module is included in later Python
  versions, either this enhancement will be backwards compatible
  or [getopt] will remain in the distribution to support existing
  scripts.

  SEE ALSO, `sys.argv`

  FUNCTIONS:

  getopt.getopt(args, options [,long_options]])
      The argument 'args' is the actual list of options being
      parsed, most commonly 'sys.argv[1:]'.  The argument
      'options' and the optional argument 'long_options' contain
      formats for acceptable options.  If any options specified
      in 'args' do not match any acceptable format, a
      `getopt.GetoptError` exception is raised.  All options must
      begin with either a single dash for single-letter options
      or a double dash for long options (DOS-style leading slashes
      are not usable, unfortunately).

      The return value of `getopt.getopt()` is a pair containing
      an option list and a list of additional arguments.  The
      latter is typically a list of filenames the utility will
      operate on.  The option list is a list of pairs of the form
      '(option, value)'.  Under recent versions of Python, you
      can convert an option list to a dictionary with
      'dict(optlist)', which is likely to be useful.

      The 'options' format string is a sequence of letters, each
      optionally followed by a colon.  Any option letter followed
      by a colon takes a (mandatory) value after the option.

      The format for 'long_options' is a list of strings
      indicating the option names (excluding the leading dashes).
      If an option name ends with an equal sign, it requires a
      value after the option.

  It is easiest to see [getopt] in action:

      >>> import getopt
      >>> opts='-a1 -b -c 2 --foo=bar --baz file1 file2'.split()
      >>> optlist, args = getopt.getopt(opts,'a:bc:',['foo=','baz'])
      >>> optlist
      [('-a', '1'), ('-b', ''), ('-c', '2'), ('--foo', 'bar'),
      ('--baz', '')]
      >>> args
      ['file1', 'file2']
      >>> nodash = lambda s: \
      ...          s.translate(''.join(map(chr,range(256))),'-')
      >>> todict = lambda l: \
      ...          dict([(nodash(opt),val) for opt,val in l])
      >>> optdict = todict(optlist)
      >>> optdict
      {'a': '1', 'c': '2', 'b': '', 'baz': '', 'foo': 'bar'}

  You can examine options given either by looping through 'optlist'
  or by performing 'optdict.get(key, default)' type tests as needed
  in your program flow.

  =================================================================
    MODULE -- operator : Standard operations as functions
  =================================================================

  All of the standard Python syntactic operators are available in
  functional form using the [operator] module.  In most cases, it
  is more clear to use the actual operators, but in a few cases
  functions are useful.  The most common usage for [operator] is
  in conjunction with functional programming constructs.  For
  example:

      >>> import operator
      >>> lst = [1, 0, (), '', 'abc']
      >>> map(operator.not_, lst)   # fp-style negated bool vals
      [0, 1, 1, 1, 0]
      >>> tmplst = []               # imperative style
      >>> for item in lst:
      ...     tmplst.append(not item)
      ...
      >>> tmplst
      [0, 1, 1, 1, 0]
      >>> del tmplst                # must cleanup stray name

  As well as being shorter, I find the FP style more clear.  The
  source code below provides -sample- implementations of the
  functions in the [operator] module.  The actual implementations
  are faster and are written directly in C, but the samples
  illustrate what each function does.

      #------------------ operator2.py -------------------------#
      ### Comparison functions
      lt = __lt__ = lambda a,b: a < b
      le = __le__ = lambda a,b: a <= b
      eq = __eq__ = lambda a,b: a == b
      ne = __ne__ = lambda a,b: a != b
      ge = __ge__ = lambda a,b: a >= b
      gt = __gt__ = lambda a,b: a > b
      ### Boolean functions
      not_ = __not__ = lambda o: not o
      truth = lambda o: not not o
      # Arithmetic functions
      abs = __abs__ = abs   # same as built-in function
      add = __add__ = lambda a,b: a + b
      and_ = __and__ = lambda a,b: a & b  # bitwise, not boolean
      div = __div__ = \
            lambda a,b: a/b  # depends on __future__.division
      floordiv = __floordiv__ = lambda a,b: a/b # Only for 2.2+
      inv = invert = __inv__ = __invert__ = lambda o: ~o
      lshift = __lshift__ = lambda a,b: a << b
      rshift = __rshift__ = lambda a,b: a >> b
      mod = __mod__ = lambda a,b: a % b
      mul = __mul__ = lambda a,b: a * b
      neg = __neg__ = lambda o: -o
      or_ = __or__ = lambda a,b: a | b    # bitwise, not boolean
      pos = __pos__ = lambda o: +o # identity for numbers
      sub = __sub__ = lambda a,b: a - b
      truediv = __truediv__ = lambda a,b: 1.0*a/b # New in 2.2+
      xor = __xor__ = lambda a,b: a ^ b
      ### Sequence functions (note overloaded syntactic operators)
      concat = __concat__ = add
      contains = __contains__ = lambda a,b: b in a
      countOf = lambda seq,a: len([x for x in seq if x==a])
      def delitem(seq,a): del seq[a]
      __delitem__ = delitem
      def delslice(seq,b,e): del seq[b:e]
      __delslice__ = delslice
      getitem = __getitem__ = lambda seq,i: seq[i]
      getslice = __getslice__ = lambda seq,b,e: seq[b:e]
      indexOf = lambda seq,o: seq.index(o)
      repeat = __repeat__ = mul
      def setitem(seq,i,v): seq[i] = v
      __setitem__ = setitem
      def setslice(seq,b,e,v): seq[b:e] = v
      __setslice__ = setslice
      ### Functionality functions (not implemented here)
      # The precise interfaces required to pass the below tests
      #     are ill-defined, and might vary at limit-cases between
      #     Python versions and custom data types.
      import operator
      isCallable = callable     # just use built-in 'callable()'
      isMappingType = operator.isMappingType
      isNumberType = operator.isNumberType
      isSequenceType = operator.isSequenceType

  =================================================================
    MODULE -- sys : Information about current Python interpreter
  =================================================================

  As with the Python "userland" objects you create within your
  applications, the Python interpreter itself is very open to
  introspection.  Using the [sys] module, you can examine and
  modify many aspects of the Python runtime environment.
  However, as with much of functionality in the [os] module, some
  of what [sys] provides is too esoteric to address in this book
  about text processing.  Consult the _Python Library Reference_
  for information on those attributes and functions not covered
  here.

  The module attributes `sys.exc_type`, `sys.exc_value`, and
  `sys.exc_traceback` have been deprecated in favor of the function
  `sys.exc_info()`. All of these, and also `sys.last_type`,
  `sys.last_value`, `sys.last_traceback`, and `sys.tracebacklimit`,
  let you poke into exceptions and stack frames to a finer degree
  than the basic `try` and `except` statements do.
  `sys.exec_prefix` and `sys.executable` provide information on
  installed paths for Python.

  The functions `sys.displayhook()` and `sys.excepthook()` control
  where program output goes, and `sys.__displayhook__` and
  `sys.__excepthook__` retain their original values (e.g., STDOUT
  and STDERR). `sys.exitfunc` affects interpreter cleanup. The
  attributes `sys.ps1` and `sys.ps2` control prompts in the Python
  interactive shell.

  Other attributes and methods simply provide more detail than
  you almost ever need to know for text processing applications.
  The attributes `sys.dllhandle` and `sys.winver` are Windows
  specific; `sys.setdlopenflags()`, and `sys.getdlopenflags()`
  are Unix only. Methods like `sys.builtin_module_names`,
  `sys.getrecursionlimit()`, `sys._getframe()`,
  `sys.setcheckinterval()`, `sys.modules`, `sys.prefix`,
  `sys.setprofile()`, `sys.setrecursionlimit()`, `sys.settrace()`,
  and `sys.warnoptions` concern Python internals. Unicode behavior
  is affected by `sys.setdefaultencoding()` (but is overridable
  with arguments anyway).


  ATTRIBUTES:

  sys.argv
      A list of command-line arguments passed to a Python
      script. The first item, 'argv[0]', is the script name itself,
      so you are normally interested in 'argv[1:]' when parsing
      arguments.

      SEE ALSO, [getopt], `sys.stdin`, `sys.stdout`

  sys.byteorder
      The native byte order (endianness) of the current platform.
      Possible values are 'big' and 'little'.  Available in
      Python 2.0+.

  sys.copyright
      A string with copyright information for the current Python
      interpreter.

  sys.hexversion
      The version number of the current Python interpreter
      as an integer.  This number increases with every version,
      even nonproduction releases.  This attribute is not very
      human-readable; `sys.version` or `sys.version_info` is
      generally easier to work with.

      SEE ALSO, `sys.version`, `sys.version_info`

  sys.maxint
      The largest positive integer supported by Python's regular
      integer type, on most platforms, 2**31-1. The largest
      negative integer is -sys.maxint-1.

  sys.maxunicode
      The integer of the largest supported code point for a
      Unicode character under the current configuration.
      Unicode characters are stored as UCS-2 or UCS-4.

  sys.path
      A list of the pathnames searched for modules.  You may
      modify this path to control module loading.

  sys.platform
      A  string identifying the OS platform.

      SEE ALSO, `os.uname()`

  sys.stderr
  sys.__stderr__
      File object for standard error stream (STDERR).
      `sys.__stderr__` retains the original value in case
      `sys.stderr` is modified during program execution.  Error
      messages and warnings from the Python interpreter are
      written to `sys.stderr`.  The most typical use of
      `sys.stderr` is for application messages that indicate
      "abnormal" conditions.  For example:

      #*------ Typical usage of sys.stderr and sys.stdout -----#
      % cat cap_file.py
      #!/usr/bin/env python
      import sys, string
      if len(sys.argv) < 2:
          sys.stderr.write("No filename specified\n")
      else:
          fname = sys.argv[1]
          try:
              input = open(fname).read()
              sys.stdout.write(string.upper(input))
          except:
              sys.stderr.write("Could not read '%s'\n" % fname)
      % ./cap_file.py this > CAPS
      % ./cap_file.py nosuchfile > CAPS
      Could not read 'nosuchfile'
      % ./cap_file.py > CAPS
      No filename specified

      SEE ALSO, `sys.argv`, `sys.stdin`, `sys.stdout`

  sys.stdin
  sys.__stdin__
      File object for standard input stream (STDIN).
      `sys.__stdin__` retains the original value in case
      `sys.stdin` is modified during program execution.
      `input()` and `raw_input()` are read from `sys.stdin`, but
      the most typical use of `sys.stdin` is for piped and
      redirected streams on the command line.  For example:

      #*-------------- Typical usage of sys.stdin -------------#
      % cat cap_stdin.py
      #!/usr/bin/env python
      import sys, string
      input = sys.stdin.read()
      print string.upper(input)
      % echo "this and that" | ./cap_stdin.py
      THIS AND THAT

      SEE ALSO, `sys.argv`, `sys.stderr`, `sys.stdout`

  sys.stdout
  sys.__stdout__
      File object for standard output stream (STDOUT).
      `sys.__stdout__` retains the original value in case
      `sys.stdout` is modified during program execution.  The
      formatted output of the `print` statement goes to
      `sys.stdout`, and you may also use regular file methods,
      such as `sys.stdout.write()`.

      SEE ALSO, `sys.argv`, `sys.stderr`, `sys.stdin`

  sys.version
      A string containing version information on the current
      Python interpreter.  The form of the string is
      'version (#build_num, build_date, build_time) [compiler]'.
      For example:

      >>> print sys.version
      1.5.2 (#0 Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)]

      Or:

      >>> print sys.version
      2.2 (#1, Apr 17 2002, 16:11:12)
      [GCC 2.95.2 19991024 (release)]

      This version independent way to find the major, minor, and
      micro version components should work for 1.5-2.3.x,
      (at least):

      >>> from string import split
      >>> from sys import version
      >>> ver_tup = map(int, split(split(version)[0],'.'))+[0]
      >>> major, minor, point = ver_tup[:3]
      >>> if (major, minor) >= (1, 6):
      ...     print "New Way"
      ... else:
      ...     print "Old Way"
      ...
      New Way

  sys.version_info
      A 5-tuple containing five components of the version number
      of the current Python interpreter: '(major, minor, micro,
      releaselevel, serial)'.  'releaselevel' is a descriptive
      phrase; the other are integers.

      >>> sys.version_info
      (2, 2, 0, 'final', 0)

      Unfortunately, this attribute was added to Python 2.0, so
      its items are not entirely useful in requiring a minimal
      version for some desired functionality.

      SEE ALSO, `sys.version`

  FUNCTIONS:

  sys.exit([code=0])
      Exit Python with exit code 'code'. Cleanup actions specified
      by 'finally' clauses of 'try' statements are honored, and it
      is possible to intercept the exit attempt by catching the
      SystemExit exception.  You may specify a numeric exit code
      for those systems that codify them; you may also specify a
      string exit code, which is printed to STDERR (with the actual
      exit code set to 1).

  sys.getdefaultencoding()
      Return the name of the default Unicode string encoding in
      Python 2.0+.

  sys.getrefcount(obj)
      Return the number of references to the object 'obj'. The
      value returned is one higher than you might expect, because
      it includes the (temporary) reference passed as the
      argument.

      >>> x = y = "hi there"
      >>> import sys
      >>> sys.getrefcount(x)
      3
      >>> lst = [x, x, x]
      >>> sys.getrefcount(x)
      6

  SEE ALSO, [os]

  =================================================================
    MODULE -- types : Standard Python object types
  =================================================================

  Every object in Python has a type; you can find it by using the
  built-in function `type()`. Often Python functions use a sort of
  -ad hoc- overloading, which is implemented by checking features
  of objects passed as arguments. Programmers coming from languages
  like C or Java are sometimes surprised by this style, since they
  are accustomed to seeing multiple "type signatures" for each set
  of argument types the function can accept. But that is not the
  Python way.

  Experienced Python programmers try not to rely on the precise
  types of objects, not even in an inheritance sense. This attitude
  is also sometimes surprising to programmers of other languages
  (especially statically typed).  What is usually important to a
  Python program is what an object can -do-, not what it -is-.
  In fact, it has become much more complicated to describe what
  many objects -are- with the "type/class unification" in Python
  2.2 and above (the details are outside the scope of this book).

  For example, you might be inclined to write an overloaded
  function in the following manner:

      #-------- Naive overloading of argument ---------#
      import types, exceptions
      def overloaded_get_text(o):
          if type(o) is types.FileType:
              text = o.read()
          elif type(o) is types.StringType:
              text = o
          elif type(o) in (types.IntType, types.FloatType,
                           types.LongType, types.ComplexType):
              text = repr(o)
          else:
              raise exceptions.TypeError
          return text

  The problem with this rigidly typed code is that it is far more
  fragile than is necessary. Something need not be an actual
  'FileType' to read its text, it just needs to be sufficiently
  "file-like" (e.g.,a `urllib.urlopen()` or `cStringIO.StringIO()`
  object is file-like enough for this purpose). Similarly, a
  new-style object that descends from `types.StringType` or a
  `UserString.UserString()` object is "string-like" enough to
  return as such, and similarly for other numeric types.

  A better implementation of the function above is:

      #---- "Quacks like a duck" overloading of argument -----#
      def overloaded_get_text(o):
          if hasattr(o,'read'):
              return o.read()
          try:
              return ""+o
          except TypeError:
              pass
          try:
              return repr(0+o)
          except TypeError:
              pass
          raise

  At times, nonetheless, it is useful to have symbolic names
  available to name specific object types.  In many such cases,
  an empty or minimal version of the type of object may be used
  in conjunction with the `type()` function equally well--the
  choice is mostly stylistic:

      >>> type('') == types.StringType
      1
      >>> type(0.0) == types.FloatType
      1
      >>> type(None) == types.NoneType
      1
      >>> type([]) == types.ListType
      1

  BUILT-IN:

  type(o)
      Return the datatype of any object 'o'.  The return value of
      this function is itself an object of the type
      `types.TypeType`.  TypeType objects implement '.__str__()'
      and '.__repr__()' methods to create readable descriptions
      of object types.

      >>> print type(1)
      <type 'int'>
      >>> print type(type(1))
      <type 'type'>
      >>> type(1) is type(0)
      1

  CONSTANTS:

  types.BuiltinFunctionType
  types.BuiltinMethodType
      The type for built-in functions like `abs()`, `len()`, and
      `dir()`, and for functions in "standard" C extensions like
      [sys] and [os].  However, extensions like [string] and [re]
      are actually Python wrappers for C extensions, so their
      functions are of type `types.FunctionType`.  A general Python
      programmer need not worry about these fussy details.

  types.BufferType
      The type for objects created by the built-in buffer()
      function.

  types.ClassType
      The type for user-defined classes.

      >>> from operator import eq
      >>> from types import *
      >>> map(eq, [type(C), type(C()), type(C().foo)],
      ...         [ClassType, InstanceType, MethodType])
      [1, 1, 1]

      SEE ALSO, `types.InstanceType`, `types.MethodType`

  types.CodeType
      The type for code objects such as returned by 'compile()'.

  types.ComplexType
      Same as 'type(0+0j)'.

  types.DictType
  types.DictionaryType
      Same as 'type({})'.

  types.EllipsisType
      The type for built-in Ellipsis object.

  types.FileType
      The type for open file objects.

      >>> from sys import stdout
      >>> fp = open('tst','w')
      >>> [type(stdout), type(fp)] == [types.FileType]*2
      1

  types.FloatType
      Same as 'type(0.0)'.

  types.FrameType
      The type for frame objects like 'tb.tb_frame' where 'tb'
      has type `types.TracebackType`.

  types.FunctionType
  types.LambdaType
      Same as 'type(lambda:0)'.

  types.GeneratorType
      The type for generator-iterator objects in Python 2.2+.

      >>> from __future__ import generators
      >>> def foo(): yield 0
      ...
      >>> type(foo) == types.FunctionType
      1
      >>> type(foo()) == types.GeneratorType
      1

      SEE ALSO, `types.FunctionType`

  types.InstanceType
      The type for instances of user-defined classes.

      SEE ALSO, `types.ClassType`, `types.MethodType`

  types.IntType
      Same as 'type(0)'.

  types.ListType
      Same as 'type([])'.

  types.LongType
      Same as 'type(0L)'.

  types.MethodType
  types.UnboundMethodType
      The type for methods of user-defined class instances.

      SEE ALSO, `types.ClassType`, `types.InstanceType`

  types.ModuleType
      The type for modules.

      >>> import os, re, sys
      >>> [type(os), type(re), type(sys)] == [types.ModuleType]*3
      1

  types.NoneType
      Same as 'type(None)'.

  types.StringType
      Same as 'type(" ")'.

  types.TracebackType
      The type for traceback objects found in `sys.exc_traceback`.

  types.TupleType
      Same as 'type(())'.

  types.UnicodeType
      Same as 'type(u"")'.

  types.SliceType
      The type for objects returned by 'slice()'.

  types.StringTypes
      Same as '(types.StringType,types.UnicodeType)'.

      SEE ALSO, `types.StringType`, `types.UnicodeType`

  types.TypeType
      Same as 'type(type(obj))' (for any 'obj').

  types.XRangeType
      Same as 'type(xrange(1))'.


  TOPIC -- Working with the Local Filesystem
  --------------------------------------------------------------------

  =================================================================
    MODULE -- dircache : Read and cache directory listings
  =================================================================

  The [dircache] module is an enhanced version of the
  `os.listdir()` function. Unlike the [os] function, [dircache]
  keeps prior directory listings in memory to avoid the need for a
  new call to the filesystem. Since [dircache] is smart enough to
  check whether a directory has been touched since last caching,
  [dircache] is a complete replacement for `os.listdir()` (with
  possible minor speed gains).

  FUNCTIONS:

  dircache.listdir(path)
      Return a directory listing of path 'path'.  Uses a list
      cached in memory where possible.

  dircache.opendir(path)
      Identical to `dircache.listdir()`.  Legacy function to
      support old scripts.

  dircache.annotate(path, lst)
      Modify the list 'lst' in place to indicate which items are
      directories, and which plain files.  The string 'path'
      should indicate the path to reach the listed files.

      >>> l = dircache.listdir('/tmp')
      >>> l
      ['501', 'md10834.db']
      >>> dircache.annotate('/tmp', l)
      >>> l
      ['501/', 'md10834.db']

  =================================================================
    MODULE -- filecmp : Compare files and directories
  =================================================================

  The [filecmp] module lets you check whether two files are
  identical, and whether two directories contain some identical
  files.  You have several options in determining how thorough of
  a comparison is performed.

  FUNCTIONS:

  filecmp.cmp(fname1, fname2 [,shallow=1 [,use_statcache=0]])
      Compare the file named by the string 'fname1' with the file
      named by the string 'fname2'.  If the default true value of
      'shallow' is used, the comparison is based only on the
      mode, size, and modification time of the two files.  If
      'shallow' is a false value, the files are compared byte by
      byte.  Unless you are concerned that someone will
      deliberately falsify timestamps on files (as in a
      cryptography context), a shallow comparison is quite
      reliable.  However, 'tar' and 'untar' can also change
      timestamps.

      >>> import filecmp
      >>> filecmp.cmp('dir1/file1', 'dir2/file1')
      0
      >>> filecmp.cmp('dir1/file2', 'dir2/file2', shallow=0)
      1

      The 'use_statcache' argument is not relevant for Python
      2.2+.  In older Python versions, the [statcache] module
      provided (slightly) more efficient cached access to file
      stats, but its use is no longer needed.

  filecmp.cmpfiles(dirname1, dirname2, fnamelist [,shallow=1
    -                       [,use_statcache=0]])
      Compare those filenames listed in 'fnamelist' if they occur
      in both the directory 'dirname1' and the directory
      'dirname2'.  `filecmp.cmpfiles()` returns a tuple of three
      lists (some of the lists may be empty):
      '(matches,mismatches,errors)'.  'matches' are identical files
      in both directories, 'mismatches' are nonidentical files in
      both directories.  'errors' will contain names if a file
      exists in neither, or in only one, of the two directories,
      or if either file cannot be read for any reason
      (permissions, disk problems, etc.).

      >>> import filecmp, os
      >>> filecmp.cmpfiles('dir1','dir2',['this','that','other'])
      (['this'], ['that'], ['other'])
      >>> print os.popen('ls -l dir1').read()
      -rwxr-xr-x    1 quilty   staff     169 Sep 27 00:13 this
      -rwxr-xr-x    1 quilty   staff     687 Sep 27 00:13 that
      -rwxr-xr-x    1 quilty   staff     737 Sep 27 00:16 other
      -rwxr-xr-x    1 quilty   staff     518 Sep 12 11:57 spam
      >>> print os.popen('ls -l dir2').read()
      -rwxr-xr-x    1 quilty   staff     169 Sep 27 00:13 this
      -rwxr-xr-x    1 quilty   staff     692 Sep 27 00:32 that

      The 'shallow' and 'use_statcache' arguments are the same as
      those to `filecmp.cmp()`.

  CLASSES:

  filecmp.dircmp(dirname1, dirname2 [,ignore=... [,hide=...])
      Create a directory comparison object.  'dirname1' and
      'dirname2' are two directories to compare.  The optional
      argument 'ignore' is a sequence of pathnames to ignore
      and defaults to '["RCS","CVS","tags"]'; 'hide' is a
      sequence of pathnames to hide and defaults to
      '[os.curdir,os.pardir]' (i.e., '[".",".."]').

  METHODS AND ATTRIBUTES:

  The attributes of `filecmp.dircmp` are read-only.  Do not
  attempt to modify them.

  filecmp.dircmp.report()
      Print a comparison report on the two directories.

      >>> mycmp = filecmp.dircmp('dir1','dir2')
      >>> mycmp.report()
      diff dir1 dir2
      Only in dir1 : ['other', 'spam']
      Identical files : ['this']
      Differing files : ['that']

  filecmp.dircmp.report_partial_closure()
      Print a comparison report on the two directories, including
      immediate subdirectories.  The method name has nothing to
      do with the theoretical term "closure" from functional
      programming.

  filecmp.dircmp.report_partial_closure()
      Print a comparison report on the two directories, recursively
      including all nested subdirectories.

  filecmp.dircmp.left_list
      Pathnames in the 'dirname1' directory, filtering out the
      'hide' and 'ignore' lists.

  filecmp.dircmp.right_list
      Pathnames in the 'dirname2' directory, filtering out the
      'hide' and 'ignore' lists.

  filecmp.dircmp.common
      Pathnames in both directories.

  filecmp.dircmp.left_only
      Pathnames in 'dirname1' but not 'dirname2'.

  filecmp.dircmp.right_only
      Pathnames in 'dirname2' but not 'dirname1'.

  filecmp.dircmp.common_dirs
      Subdirectories in both directories.

  filecmp.dircmp.common_files
      Filenames in both directories.

  filecmp.dircmp.common_funny
      Path names in both directories, but of different types.

  filecmp.dircmp.same_files
      Filenames of identical files in both directories.

  filecmp.dircmp.diff_files
      Filenames of nonidentical files whose name occurs in both
      directories.

  filecmp.dircmp.funny_files
      Filenames in both directories where something goes wrong
      during comparison.

  filecmp.dircmp.subdirs
      A dictionary mapping `filecmp.dircmp.common_dirs` strings
      to corresponding `filecmp.dircmp` objects, for example:

      >>> usercmp = filecmp.dircmp('/Users/quilty','/Users/alb')
      >>> usercmp.subdirs['Public'].common
      ['Drop Box']

  SEE ALSO, `os.stat()`, `os.listdir()`

  =================================================================
    MODULE -- fileinput : Read multiple files or STDIN
  =================================================================

  Many utilities, especially on Unix-like systems, operate
  line-by-line on one or more files and/or on redirected input. A
  flexibility in treating input sources in a homogeneous fashion is
  part of the "Unix philosophy." The [fileinput] module allows you
  to write a Python application that uses these common conventions
  with almost no special programming to adjust to input sources.

  A common, minimal, but extremely useful Unix utility is 'cat',
  which simply writes its input to STDOUT (allowing redirection
  of STDOUT as needed).  Below are a few simple examples of 'cat':

      #*---------- Examples of 'cat' utility ---------#
      % cat a
      AAAAA
      % cat a b
      AAAAA
      BBBBB
      % cat - b < a
      AAAAA
      BBBBB
      % cat < b
      BBBBB
      % cat a < b
      AAAAA
      % echo "XXX" | cat a -
      AAAAA
      XXX

  Notice that STDIN is read only if either "-" is given as an
  argument, or no arguments are given at all. We can implement a
  Python version of 'cat' using the [fileinput] module as follows:

      #------------- cat.py -----------------#
      #!/usr/bin/env python
      import fileinput
      for line in fileinput.input():
              print line,

  FUNCTIONS:

  fileinput.input([files=sys.argv[1:] [,inplace=0 [,backup=".bak"]]])
      Most commonly, this function will be used without any of
      its optional arguments, as in the introductory example of
      'cat.py'.  However, behavior may be customized for special
      cases.

      The argument 'files' is a sequence of filenames to process.
      By default, it consists of all the arguments given on the
      command line.  Commonly, however, you might want to treat
      some of these arguments as flags rather than filenames
      (e.g., if they start with '-' or '/').  Any list of
      filenames you like may be used as the 'files' argument,
      whether or not it is built from 'sys.argv'.

      If you specify a true value for 'inplace', output will go
      into each file specified rather than to STDOUT.  Input
      taken from STDIN, however, will still go to STDOUT.  For
      in-place operation, a temporary backup file is created as
      the actual input source and is given the extension indicated
      by the 'backup' argument.  For example:

      #*------ Modifying files in place with [fileinput] ------#
      % cat a b
      AAAAA
      BBBBB
      % cat modify.py
      #!/usr/bin/env python
      import fileinput, sys
      for line in fileinput.input(sys.argv[1:], inplace=1):
              print "MODIFIED", line,
      % echo "XXX" | ./modify.py a b -
      MODIFIED XXX
      % cat a b
      MODIFIED AAAAA
      MODIFIED BBBBB

  fileinput.close()
      Close the input sequence.

  fileinput.nextfile()
      Close the current file, and proceed to the next one.  Any
      unread lines in the current file will not be counted
      towards the line total.

  There are several functions in the [fileinput] module that
  provide information about the current input state.  These tests
  can be used to process the current line in a context-dependent
  way.

  fileinput.filelineno()
      The number of lines read from the current file.

  fileinput.filename()
      The name of the file from which the last line was read.
      Before a line is read, the function returns 'None'.

  fileinput.isfirstline()
      Same as 'fileinput.filelineno()==1'.

  fileinput.isstdin()
      True if the last line read was from STDIN.

  fileinput.lineno()
      The number of lines read during the input loop, cumulative
      between files.

  CLASSES:

  fileinput.FileInput([files [,inplace=0 [,backup=".bak"]]])
      The methods of `fileinput.FileInput` are the same as the
      module-level functions, plus an additional '.readline()'
      method that matches that of file objects.
      `fileinput.FileInput` objects also have a '.__getitem__()'
      method to support sequential access.

      The arguments to initialize a `fileinput.FileInput` object
      are the same as those passed to the `fileinput.input()`
      function.  The class exists primarily in order to allow
      subclassing.  For normal usage, it is best to just use the
      [fileinput] functions.

  SEE ALSO, `multifile`, [xreadlines]

  =================================================================
    MODULE -- glob : Filename globing utility
  =================================================================

  The [glob] module provides a list of pathnames matching a
  glob-style pattern.  The [fnmatch] module is used internally
  to determine whether a path matches.

  FUNCTIONS:

  glob.glob(pat)
      Both directories and plain files are returned, so if you are
      only interested in one type of path, use `os.path.isdir()`
      or `os.path.isfile()`;  other functions in [os.path] also
      support other filters.

      Pathnames returned by `glob.glob()` contain as much absolute
      or relative path information as the pattern 'pat' gives.
      For example:

      >>> import glob, os.path
      >>> glob.glob('/Users/quilty/Book/chap[3-4].txt')
      ['/Users/quilty/Book/chap3.txt', '/Users/quilty/Book/chap4.txt']
      >>> glob.glob('chap[3-6].txt')
      ['chap3.txt', 'chap4.txt', 'chap5.txt', 'chap6.txt']
      >>> filter(os.path.isdir, glob.glob('/Users/quilty/Book/[A-Z]*'))
      ['/Users/quilty/Book/SCRIPTS', '/Users/quilty/Book/XML']

  SEE ALSO, [fnmatch], [os.path]

  =================================================================
    MODULE -- linecache : Cache lines from files
  =================================================================

  The module [linecache] can be used to simulate relatively
  efficient random access to the lines in a file.  Lines that are
  read are cached for later access.

  FUNCTIONS:

  linecache.getline(fname, linenum)
      Read line 'linenum' from the file named 'fname'.  If an
      error occurs reading the line, the function will catch the
      error and return an empty string.  'sys.path' is also
      searched for the filename if it is not found in the
      current directory.

      >>> import linecache
      >>> linecache.getline('/etc/hosts', 15)
      '192.168.1.108   hermes  hermes.gnosis.lan\n'

  linecache.clearcache()
      Clear the cache of read lines.

  linecache.checkcache()
      Check whether files in the cache have been modified since
      they were cached.

  =================================================================
    MODULE -- os.path : Common pathname manipulations
  =================================================================

  The [os.path] module provides a variety of functions to analyze
  and manipulate filesystem paths in a cross-platform fashion.

  FUNCTIONS:

  os.path.abspath(pathname)
      Return an absolute path for a (relative) pathname.

      >>> os.path.abspath('SCRIPTS/mk_book')
      '/Users/quilty/Book/SCRIPTS/mk_book'

  os.path.basename(pathname)
      Same as 'os.path.split(pathname)[1]'.

  os .path.commonprefix(pathlist)
      Return the path to the most nested parent directory shared
      by all elements of the sequence 'pathlist'.

      >>> os.path.commonprefix(['/usr/X11R6/bin/twm',
      ...                       '/usr/sbin/bash',
      ...                       '/usr/local/bin/dada'])
      '/usr/'

  os.path.dirname(pathname)
      Same as 'os.path.split(pathname)[0]'.

  os.path.exists(pathname)
      Return true if the pathname 'pathname' exists.

  os.path.expanduser(pathname)
      Expand pathnames that include the tilde character: '~'.
      Under standard Unix shells, an initial tilde refers to a
      user's home directory, and a tilde followed by a name refers
      to the named user's home directory.  This function emulates
      that behavior on other platforms.

      >>> os.path.expanduser('~alb')
      '/Users/alb'
      >>> os.path.expanduser('~/Book')
      '/Users/quilty/Book'

  os.path.expandvars(pathname)
      Expand 'pathname' by replacing environment variables in a
      Unix shell style.  While this function is in the [os.path]
      module, you could equally use it for bash-like scripting in
      Python, generally (this is not necessarily a good idea, but
      it is possible).

      >>> os.path.expandvars('$HOME/Book')
      '/Users/quilty/Book'
      >>> from os.path import expandvars as ev  # Python 2.0+
      >>> if ev('$HOSTTYPE')=='macintosh' and ev('$OSTYPE')=='darwin':
      ...     print ev("The vendor is $VENDOR, the CPU is $MACHTYPE")
      ...
      The vendor is apple, the CPU is powerpc

  os.path.getatime(pathname)
      Return the last access time of 'pathname' (or raise
      'os.error' if checking is not possible).

  os.path.getmtime(pathname)
      Return the modification time of 'pathname' (or raise
      'os.error' if checking is not possible).

  os.path.getsize(pathname)
      Return the size of 'pathname' in bytes (or raise 'os.error'
      if checking is not possible).

  os.path.isabs(pathname)
      Return true if 'pathname' is an absolute path.

  os.path.isdir(pathname)
      Return true if 'pathname' is a directory.

  os.path.isfile(pathname)
      Return true if 'pathname' is a regular file (including
      symbolic links).

  os.path.islink(pathname)
      Return true if 'pathname' is a symbolic link.

  os.path.ismount(pathname)
      Return true if 'pathname' is a mount point (on POSIX
      systems).

  os.path.join(path1 [,path2 [...]])
      Join multiple path components intelligently.

      >>> os.path.join('/Users/quilty/','Book','SCRIPTS/','mk_book')
      '/Users/quilty/Book/SCRIPTS/mk_book'

  os.path.normcase(pathname)
      Convert 'pathname' to canonical lowercase on
      case-insensitive filesystems.  Also convert slashes on
      Windows systems.

  os.path.normpath(pathname)
      Remove redundant path information.

      >>> os.path.normpath('/usr/local/bin/../include/./slang.h')
      '/usr/local/include/slang.h'

  os.path.realpath(pathname)
      Return the "real" path to 'pathname' after de-aliasing any
      symbolic links.  New in Python 2.2+.

      >>> os.path.realpath('/usr/bin/newaliases')
      '/usr/sbin/sendmail'

  os.path.samefile(pathname1, pathname2)
      Return true if 'pathname1' and 'pathname2' are the same
      file.

      SEE ALSO, [filecmp]

  os.path.sameopenfile(fp1, fp2)
      Return true if the file handles 'fp1' and 'fp2' refer to the
      same file.  Not available on Windows.

  os.path.split(pathname)
      Return a tuple containing the path leading up to the named
      pathname and the named directory or filename in isolation.

      >>> os.path.split('/Users/quilty/Book/SCRIPTS')
      ('/Users/quilty/Book', 'SCRIPTS')

  os.path.splitdrive(pathname)
      Return a tuple containing the drive letter and the rest of
      the path.  On systems that do not use a drive letter, the
      drive letter is empty (as it is where none is specified on
      Windows-like systems).

  os.path.walk(pathname, visitfunc, arg)
      For every directory recursively contained in 'pathname',
      call 'visitfunc(arg,dirname,pathnames)' for each path.

      >>> def big_files(minsize, dirname, files):
      ...     for file in files:
      ...         fullname = os.path.join(dirname,file)
      ...         if os.path.isfile(fullname):
      ...             if os.path.getsize(fullname) >= minsize:
      ...                 print fullname
      ...
      >>> os.path.walk('/usr/', big_files, 5e6)
      /usr/lib/libSystem.B_debug.dylib
      /usr/lib/libSystem.B_profile.dylib

  =================================================================
    MODULE -- shutil : Copy files and directory trees
  =================================================================

  The functions in the [shutil] module make working with files a
  bit easier. There is nothing in this module that you could not do
  using basic file objects and [os.path] functions, but [shutil]
  often provides a more direct means and handles minor details for
  you.  The functions in [shutil] match fairly closely the
  capabilities you would find in Unix file system utilities like
  'cp' and 'rm'.

  FUNCTIONS:

  shutil.copy(src, dst)
      Copy the file named 'src' to the pathname 'dst'.  If 'dst'
      is a directory, the created file is given the name
      'os.path.join(dst+os.path.basename(src))'.

      SEE ALSO, `os.path.join()`, `os.path.basename()`

  shutil.copy2(src, dst)
      Same as `shutil.copy()` except that the access and creation
      time of 'dst' are set to the values in 'src'.

  shutil.copyfile(src, dst)
      Copy the file named 'src' to the filename 'dst' (overwriting
      'dst' if present).  Basically, this has the same effect as
      'open(dst,"wb").write(open(src,"rb").read())'.

  shutil.copyfileobj(fpsrc, fpdst [,buffer=-1])
      Copy the file-like object 'fpsrc' to the file-like object
      'fpdst'.  If the optional argument 'buffer' is given, only
      the specified number of bytes are read into memory at a
      time; this allows copying very large files.

  shutil.copymode(src, dst)
      Copy the permission bits from the file named 'src' to the
      filename 'dst'.

  shutil.copystat(src, dst)
      Copy the permission and timestamp data from the file named
      'src' to the filename 'dst'.

  shutil.copytree(src, dst [,symlinks=0])
      Copy the directory 'src' to the destination 'dst'
      recursively.  If the optional argument 'symlinks' is a
      true value, copy symbolic links as links rather than the
      default behavior of copying the content of the link
      target.  This function may not be entirely reliable on
      every platform and filesystem.

  shutil.rmtree(dirname [ignore [,errorhandler]])
      Remove an entire directory tree rooted at 'dirname'.  If
      optional argument 'ignore' is a true value, errors will be
      silently ignored.  If 'errorhandler' is given, a custom
      error handler is used to catch errors.  This function may
      not be entirely reliable on every platform and filesystem.


  SEE ALSO, `open()`, [os.path]

  =================================================================
    MODULE -- stat : Constants/functions for os.stat()
  =================================================================

  The [stat] module provides two types of support for analyzing the
  results of `os.stat()`, `os.lstat()`, and `os.fstat()` calls.

  Several functions exist to allow you to perform tests on a file.
  If you simply wish to check one predicate of a file, it is more
  direct to use one of the `os.path.is*()` functions, but for
  performing several such tests, it is faster to read the mode once
  and perform several `stat.S_*()` tests.

  As well as helper functions, [stat] defines symbolic constants
  to access the fields of the 10-tuple returned by `os.stat()`
  and friends.   For example:

      >>> from stat import *
      >>> import os
      >>> fileinfo = os.stat('chap1.txt')
      >>> fileinfo[ST_SIZE]
      68666L
      >>> mode = fileinfo[ST_MODE]
      >>> S_ISSOCK(mode)
      0
      >>> S_ISDIR(mode)
      0
      >>> S_ISREG(mode)
      1

  FUNCTIONS:

  stat.S_ISDIR(mode)
      Mode indicates a directory.

  stat.S_ISCHR(mode)
      Mode indicates a character special device file.

  stat.S_ISBLK(mode)
      Mode indicates a block special device file.

  stat.S_ISREG(mode)
      Mode indicates a regular file.

  stat.S_ISFIFO(mode)
      Mode indicates a FIFO (named pipe).

  stat.S_ISLNK(mode)
      Mode indicates a symbolic link.

  stat.S_ISSOCK(mode)
      Mode indicates a socket.

  CONSTANTS:

  stat.ST_MODE
      I-node protection mode.

  stat.ST_INO
      I-node number.

  stat.ST_DEV
      Device.

  stat.ST_NLINK
      Number of links to this i-node.

  stat.ST_UID
      User id of file owner.

  stat.ST_GID
      Group id of file owner.

  stat.ST_SIZE
      Size of file.

  stat.ST_ATIME
      Last access time.

  stat.ST_MTIME
      Modification time.

  stat.ST_CTIME
      Time of last status change.

  =================================================================
    MODULE -- tempfile : Temporary files and filenames
  =================================================================

  The [tempfile] module is useful when you need to store transient
  data using a file-like interface.  In contrast to the file-like
  interface of [StringIO], [tempfile] uses the actual filesystem
  for storage rather than simulating the interface to a file in
  memory.  In memory-constrained contexts, therefore, [tempfile]
  is preferable.

  The temporary files created by [tempfile] are as secure against
  external modification as is supported by the underlying
  platform.  You can be fairly confident that your temporary data
  will not be read or changed either while your program is
  running or afterwards (temporary files are deleted when
  closed).  While you should not count on [tempfile] to provide
  you with cryptographic-level security, it is good enough to
  prevent accidents and casual inspection.

  FUNCTIONS:

  tempfile.mktemp([suffix=""])
      Return an absolute path to a unique temporary filename.  If
      optional argument 'suffix' is specified, the name will end
      with the 'suffix' string.

  tempfile.TemporaryFile([mode="w+b" [,buffsize=-1 [suffix=""]]])
      Return a temporary file object.  In general, there is
      little reason to change the default 'mode' argument of
      'w+b'; there is no existing file to append to before
      the creation, and it does little good to write temporary
      data you cannot read.  Likewise, the optional 'suffix'
      argument generally will not ever be visible, since the file
      is deleted when closed.  The default 'buffsize' uses the
      platform defaults, but may be modified if needed.

      >>> tmpfp = tempfile.TemporaryFile()
      >>> tmpfp.write('this and that\n')
      >>> tmpfp.write('something else\n')
      >>> tmpfp.tell()
      29L
      >>> tmpfp.seek(0)
      >>> tmpfp.read()
      'this and that\nsomething else\n'

  SEE ALSO, [StringIO], [cStringIO]

  =================================================================
    MODULE -- xreadlines : Efficient iteration over a file
  =================================================================

  Reading over the lines of a file had some pitfalls in older
  versions of Python: There was a memory-friendly way, and there
  was a fast way, but never the twain shall meet.  These
  techniques were:

      >>> fp = open('bigfile')
      >>> line = fp.readline()
      >>> while line:
      ...     # Memory-friendly but slow
      ...     # ...do stuff...
      ...     line = fp.readline()

      >>> for line in open('bigfile').readlines():
      ...     # Fast but memory-hungry
      ...     # ...do stuff...

  Fortunately, with Python 2.1 a more efficient technique was
  provided.  In Python 2.2+, this efficient technique was also
  wrapped into a more elegant syntactic form (in keeping with the
  new iterator).  With Python 2.3+, [xreadlines] is officially
  deprecated in favor of the idiom "'for line in file:'".

  FUNCTIONS:

  xreadlines.xreadlines(fp)
      Iterate over the lines of file object 'fp' in an efficient
      way (both speed-wise and in memory usage).

      >>> for line in xreadlines.xreadlines(open('tmp')):
      ...     # Efficient all around
      ...     # ...do stuff...

  Corresponding to this [xreadlines] module function is the
  '.xreadlines()' method of file objects.

      >>> for line in open('tmp').xreadlines():
      ...     # As a file object method
      ...     # ...do stuff...

  If you use Python 2.2 or above, an even nicer version is
  available:

      >>> for line in open('tmp'):
      ...     # ...do stuff...

  SEE ALSO, [linecache], `FILE.xreadlines()`, `os.tmpfile()`


  TOPIC -- Running External Commands and Accessing OS Features
  --------------------------------------------------------------------

  =================================================================
    MODULE -- commands : Quick access to external commands
  =================================================================

  The [commands] module exists primarily as a convenience wrapper
  for calls to `os.popen*()` functions on Unix-like systems. STDERR
  is combined with STDOUT in the results.

  FUNCTIONS:

  commands.getoutput(cmd)
      Return the output from running 'cmd'.  This function could
      also be implemented as:

      >>> def getoutput(cmd):
      ...     import os
      ...     return os.popen('{ '+cmd+'; } 2>&1').read()

  commands.getstatusoutput(cmd)
      Return a tuple containing the exit status and output from
      running 'cmd'.  This function could also be implemented as:

      >>> def getstatusoutput(cmd):
      ...     import os
      ...     fp = os.popen('{ '+cmd+'; } 2>&1')
      ...     output = fp.read()
      ...     status = fp.close()
      ...     if not status: status=0 # Want zero rather than None
      ...     return (status, output)
      ...
      >>> getstatusoutput('ls nosuchfile')
      (256, 'ls: nosuchfile: No such file or directory\n')
      >>> getstatusoutput('ls c*[1-3].txt')
      (0, 'chap1.txt\nchap2.txt\nchap3.txt\n')

  commands.getstatus(filename)
      Same as 'commands.getoutput('ls -ld '+filename)'.

  SEE ALSO, `os.popen()`, `os.popen2()`, `os.popen3()`,
  `os.popen4()`

  =================================================================
    MODULE -- os : Portable operating system services
  =================================================================

  The [os] module contains a large number of functions, attributes,
  and constants for calling on or determining features of the
  operating system that Python runs on. In many cases, functions in
  [os] are internally implemented using modules like [posix],
  [os2], [riscos], or [mac], but for portability it is better to use
  the [os] module.

  Not everything in the [os] module is documented in this book. You
  can read about those features that are unlikely to be used in
  text processing applications in the _Python Library Reference_
  that accompanies Python distributions.

  Functions and constants not documented here fall into several
  categories. The functions and attributes `os.confstr()`,
  `os.confstr_names`, `os.sysconf()`, and `os.sysconf_names` let
  you probe system configuration. As well, I skip some functions
  specific to process permissions on Unix-like systems:
  `os.ctermid()`, `os.getegid()`, `os.geteuid()`, `os.getgid()`,
  `os.getgroups()`, `os.getlogin()`, `os.getpgrp()`,
  `os.getppid()`, `os.getuid()`, `os.setegid()`, `os.seteuid()`,
  `os.setgid()`, `os.setgroups()`, `os.setpgrp()`, `os.setpgid()`,
  `os.setreuid()`, `os.setregid()`, `os.setsid()`, and
  `os.setuid(uid)`.

  The functions `os.abort()`, `os.exec*()`, `os._exit()`,
  `os.fork()`, `os.forkpty()`, `os.plock()`, `os.spawn*()`,
  `os.times()`, `os.wait()`, `os.waitpid()`, `os.WIF*()`,
  `os.WEXITSTATUS()`, os.WSTOPSIG()`, and `os.WTERMSIG()` and the
  constants `os.P_*` and `os.WNOHANG` all deal with process
  creation and management. These are not documented in this book,
  since creating and managing multiple processes is not typically
  central to text processing tasks. However, I briefly document the
  basic capabilities in `os.kill()`, `os.nice()`, `os.startfile()`,
  and `os.system()` and in the `os.popen()` family. Some of the
  omitted functionality can also be found in the [commands] and
  [sys] modules.

  A number of functions in the [os] module allow you to perform
  low-level I/O using file descriptors. In general, it is simpler
  to perform I/O using file objects created with the built-in
  `open()` function or the `os.popen*()` family. These file objects
  provide methods like `FILE.readline()`, `FILE.write()`,
  `FILE.seek()`, and `FILE.close()`. Information about files can be
  determined using the `os.stat()` function or functions in the
  [os.path] and [shutil] modules. Therefore, the functions
  `os.close()`, `os.dup()`, `os.dup2()`, `os.fpathconf()`,
  `os.fstat()`, `os.fstatvfs()`, `os.ftruncate()`, `os.isatty()`,
  `os.lseek()`, `os.open()`, `os.openpty()`, `os.pathconf()`,
  `os.pipe()`, `os.read()`, `os.statvfs()`, `os.tcgetpgrp()`,
  `os.tcsetpgrp()`, `os.ttyname()`, `os.umask()`, and `os.write()`
  are not covered here. As well, the supporting constants `os.O_*`
  and `os.pathconf_names` are omitted.

  SEE ALSO, [commands], [os.path], [shutil], [sys]

  FUNCTIONS:

  os.access(pathname, operation)
      Check the permission for the file or directory 'pathname'.
      If the type of operation specified is allowed, return a true
      value. The argument 'operation' is a number between 0 and
      7, inclusive, and encodes four features: exists, executable,
      writable, and readable. These features have symbolic names:

      >>> import os
      >>> os.F_OK, os.X_OK, os.W_OK, os.R_OK
      (0, 1, 2, 4)

      To query a specific combination of features, you may add or
      bitwise-or the individual features.

      >>> os.access('myfile', os.W_OK | os.R_OK)
      1
      >>> os.access('myfile', os.X_OK + os.R_OK)
      0
      >>> os.access('myfile', 6)
      1

  os.chdir(pathname)
      Change the current working directory to the path
      'pathname'.

      SEE ALSO, `os.getcwd()`

  os.chmod(pathname, mode)
      Change the mode of file or directory 'pathname' to numeric
      mode 'mode'.  See the 'man' page for the 'chmod' utility for
      more information on modes.

  os.chown(pathname, uid, gid)
      Change the owner and group of file or directory 'pathname'
      to 'uid' and 'gid' respectively.  See the 'man' page for
      the 'chown' utility for more information.

  os.chroot(pathname)
      Change the root directory under Unix-like systems (on
      Python 2.2+).  See the 'man' page for the 'chroot' utility
      for more information.

  os.getcwd()
      Return the current working directory as a string.

      >>> os.getcwd()
      '/Users/quilty/Book'

      SEE ALSO, `os.chdir()`

  os.getenv(var [,value=None])
      Return the value of environment variable 'var'.  If the
      environment variable is not defined, return 'value'.  An
      equivalent call is 'os.environ.get(var, value)'.

      SEE ALSO, `os.environ`, `os.putenv()`

  os.getpid()
      Return the current process id.  Possibly useful for calls
      to external utilities that use process id's.

      SEE ALSO, `os.kill()`

  os.kill(pid, sig)
      Kill an external process on Unix-like systems.  You will
      need to determine values for the 'pid' argument by some
      means, such as a call to the 'ps' utility.  Values for the
      signal 'sig' sent to the process may be found in the
      [signal] module or with 'man signal'.  For example:

      >>> from signal import *
      >>> SIGHUP, SIGINT, SIGQUIT, SIGIOT, SIGKILL
      (1, 2, 3, 6, 9)
      >>> def kill_by_name(progname):
      ...     pidstr = os.popen('ps|grep '+progname+'|sort').read()
      ...     pid = int(pidstr.split()[0])
      ...     os.kill(pid, 9)
      ...
      >>> kill_by_name('myprog')

  os.link(src, dst)
      Create a hard link from path 'src' to path 'dst' on
      Unix-like systems.  See the 'man' page on the 'ln' utility
      for more information.

      SEE ALSO, `os.symlink()`

  os.listdir(pathname)
      Return a list of the names of files and directories at path
      'pathname'.  The special entries for the current and parent
      directories (typically "." and "..") are excluded from the
      list.

  os.lstat(pathname)
      Information on file or directory 'pathname'.  See
      `os.stat()` for details.  `os.lstat()` does not follow
      symbolic links.

      SEE ALSO, `os.stat()`, [stat]

  os.mkdir(pathname [,mode=0777])
      Create a directory named 'pathname' with the numeric mode
      'mode'.  On some operating systems, 'mode' is ignored.  See
      the 'man' page for the 'chmod' utility for more information
      on modes.

      SEE ALSO, `os.chmod()`, `os.mkdirs()`

  os.mkdirs(pathname [,mode=0777])
      Create a directory named 'pathname' with the numeric mode
      'mode'.  Unlike `os.mkdir()`, this function will create any
      intermediate directories needed for a nested directory.

      SEE ALSO, `os.mkdir()`

  os.mkfifo(pathname [,mode=0666])
      Create a named pipe on Unix-like systems.

  os.nice(increment)
      Decrease the process priority of the current application
      under Unix-like systems.  This is useful if you do not wish
      for your application to hog system CPU resources.

  The four functions in the `os.popen*()` family allow you to run
  external processes and capture their STDOUT and STDERR and/or
  set their STDIN.  The members of the family differ somewhat in
  how these three pipes are handled.

  os.popen(cmd [,mode="r" [,bufsize]])
      Open a pipe to or from the external command 'cmd'.  The
      return value of the function is an open file object
      connected to the pipe.  The 'mode' may be 'r' for read (the
      default) or 'w' for write.  The exit status of the command
      is returned when the file object is closed.  An optional
      buffer size 'bufsize' may be specified.

      >>> import os
      >>> def ls(pat):
      ...     stdout = os.popen('ls '+pat)
      ...     result = stdout.read()
      ...     status = stdout.close()
      ...     if status: print "Error status", status
      ...     else: print result
      ...
      >>> ls('nosuchfile')
      ls: nosuchfile: No such file or directory
      Error status 256
      >>> ls('chap[7-9].txt')
      chap7.txt

  os.popen2(cmd [,mode [,bufsize]])
      Open both STDIN and STDOUT pipes to the external command
      'cmd'.  The return value is a pair of file objects
      connecting to the two respective pipes.  'mode' and
      'bufsize' work as with `os.popen()`.

      SEE ALSO, `os.popen3()`, `os.popen()`

  os.popen3(cmd [,mode [,bufsize]])
      Open STDIN, STDOUT and STDERR pipes to the external command
      'cmd'.  The return value is a 3-tuple of file objects
      connecting to the three respective pipes.  'mode' and
      'bufsize' work as with `os.popen()`.

      >>> import os
      >>> stdin, stdout, stderr = os.popen3('sed s/line/LINE/')
      >>> print >>stdin, 'line one'
      >>> print >>stdin, 'line two'
      >>> stdin.write('line three\n)'
      >>> stdin.close()
      >>> stdout.read()
      'LINE one\nLINE two\nLINE three\n'
      >>> stderr.read()
      ''

  os.popen4(cmd [,mode [,bufsize]])
      Open STDIN, STDOUT, and STDERR pipes to the external command
      'cmd'.  In contrast to `os.popen3()`, `os.popen4()` combines
      STDOUT and STDERR on the same pipe. The return value is a
      pipe of file objects connecting to the two respective pipes.
      'mode' and 'bufsize' work as with `os.popen()`.

      SEE ALSO, `os.popen3()`, `os.popen()`

  os.putenv(var, value)
      Set the environment variable 'var' to the value 'value'.
      Changes to the current environment only affect subprocesses
      of the current process, such as those launched with
      `os.system()` or `os.popen()`, not the whole OS.

      Calls to `os.putenv()` will update the environment, but not
      the `os.environ` variable.  Therefore, it is better to
      update `os.environ` directly (which also changes the
      external environment).

      SEE ALSO, `os.environ`, `os.getenv()`, `os.popen()`,
      `os.system()`

  os.readlink(linkname)
      Return a string containing the path symbolic link
      'linkname' points to.  Works on Unix-like systems.

      SEE ALSO, `os.symlink()`

  os.remove(filename)
      Remove the file named 'filename'.  This function is
      identical to `os.unlink()`.  If the file cannot be removed,
      an 'OSError' is raised.

      SEE ALSO, `os.unlink()`

  os.removedirs(pathname)
      Remove the directory named 'pathname' and any
      subdirectories of 'pathname'.  This function will not
      remove directories with files, and will raise an 'OSError'
      if you attempt to do so.

      SEE ALSO, `os.rmdir()`

  os.rename(src, dst)
      Rename the file or directory 'src' as 'dst'.  Depending on
      the operating system, the operation may raise an 'OSError'
      if 'dst' already exists.

      SEE ALSO, `os.renames()`

  os.renames(src, dst)
      Rename the file or directory 'src' as 'dst'.  Unlike
      `os.rename()`, this function will create any intermediate
      directories needed for a nested directory.

      SEE ALSO, `os.rename()`

  os.rmdir(pathname)
      Remove the directory named 'pathname'.  This function will
      not remove nonempty directories and will raise an
      'OSError' if you attempt to do so.

      SEE ALSO, `os.removedirs()

  os.startfile(path)
      Launch an application under Windows system.  The behavior
      is the same as if 'path' was double-clicked in a Drives
      window or as if you typed 'start <path>' at a command line.
      Using Windows associations, a data file can be launched in
      the same manner as an actual executable application.

      SEE ALSO, `os.system()`

  os.stat(pathname)
      Create a 'stat_result' object that contains information on
      the file or directory 'pathname'.  A 'stat_result' object
      has a number of attributes and also behaves like a tuple
      of numeric values.  Before Python 2.2, only the tuple was
      provided.  The attributes of a 'stat_result' object are
      named the same as the constants in the [stat] module, but
      in lowercase.

      >>> import os, stat
      >>> file_info = os.stat('chap1.txt')
      >>> file_info.st_size
      87735L
      >>> file_info[stat.ST_SIZE]
      87735L

      On some platforms, additional attributes are available.
      For example, Unix-like systems usually have '.st_blocks',
      '.st_blksize', and '.st_rdev' attributes; MacOS has
      '.st_rsize', '.st_creator', and '.st_type'; RISCOS has
      '.st_ftype', '.st_attrs', and '.st_obtype'.

      SEE ALSO, [stat], `os.lstat()`

  os.strerror(code)
      Give a description for a numeric error code 'code', such
      as that returned by 'os.popen(bad_cmd).close()'.

      SEE ALSO, `os.popen()`

  os.symlink(src, dst)
      Create a soft link from path 'src' to path 'dst' on
      Unix-like systems.  See the 'man' page on the 'ln' utility
      for more information.

      SEE ALSO, `os.link()`, `os.readlink()`

  os.system(cmd)
      Execute the command 'cmd' in a subshell.  Unlike execution
      using `os.popen()` the output of the executed process is
      not captured (but it may still echo to the same terminal as
      the current Python application).  In some cases, you can
      use `os.system()` on non-Windows systems to detach an
      application in a manner similar to `os.startfile()`.  For
      example, under MacOSX, you could launch the TextEdit
      application with:

      >>> import os
      >>> cmd="/Applications/TextEdit.app/Contents/MacOS/TextEdit &"
      >>> os.system(cmd)
      0

      SEE ALSO, `os.popen()`, `os.startfile()`, [commands]

  os.tempnam([dir [,prefix]])
      Return a unique filename for a temporary file.  If optional
      argument 'dir' is specified, that directory will be used in
      the path; if 'prefix' is specified, the file will have the
      indicated prefix.  For most purposes, it is more secure to
      use `os.tmpfile()` to directly obtain a file object rather
      than first generating a name.

      SEE ALSO, [tempfile], `os.tmpfile()`

  os.tmpfile()
      Return an "invisible" file object in update mode.  This
      file does not create a directory entry, but simply acts as
      a transient buffer for data on the filesystem.

      SEE ALSO, [tempfile], [StringIO], [cStringIO]

  os.uname()
      Return detailed information about the current operating
      system on recent Unix-like systems.  The returned 5-tuple
      contains sysname, nodename, release, version, and machine,
      each as descriptive strings.

  os.unlink(filename)
      Remove the file named 'filename'.  This function is
      identical to `os.remove()`.  If the file cannot be removed,
      an 'OSError' is raised.

      SEE ALSO, `os.remove()`

  os.utime(pathname, times)
      Set the access and modification timestamps of file 'pathname'
      to the tuple '(atime, mtime)' specified in 'times'.
      Alternately, if 'times' is 'None', set both timestamps to
      the current time.

      SEE ALSO, [time], `os.chmod()`, `os.chown()`, `os.stat()`

  CONSTANTS AND ATTRIBUTES:

  os.altsep
      Usually 'None', but an alternative path delimiter ("/")
      under Windows.

  os.curdir
      The string the operating system uses to refer to the
      current directory; for example, "." on Unix or ":" on Macintosh
      (before MacOSX).

  os.defpath
      The search path used by exec*p*() and spawn*p*() absent a
      PATH environment variable.

  os.environ
      A dictionary-like object containing the current
      environment.

      >>> os.environ['TERM']
      'vt100'
      >>> os.environ['TERM'] = 'vt220'
      >>> os.getenv('TERM')
      'vt220'

      SEE ALSO, `os.getenv()`, `os.putenv()`

  os.linesep
      The string that delimits lines in a file; for example "\n"
      on Unix, "\r" on Macintosh, "\r\n" on Windows.

  os.name
      A string identifying the operating system the current
      Python interpreter is running on.  Possible strings include
      'posix', 'nt', 'dos', 'mac', 'os2', 'ce', 'java', and
      'riscos'.

  os.pardir
      The string the operating system uses to refer to the parent
      directory; for example, ".." on Unix or "::" on Macintosh (before
      MacOSX).

  os.pathsep
      The string that delimits search paths; for example, ";" on
      Windows or ":" on Unix.

  os.sep
      The string the operating system uses to refer to path
      delimiters; for example "/" on Unix, "\" on Windows, ":" on
      Macintosh.

  SEE ALSO, [sys], [os.path]


  TOPIC -- Special Data Values and Formats
  --------------------------------------------------------------------

  =================================================================
    MODULE -- random : Pseudo-random value generator
  =================================================================

  Python provides better pseudo-random number generation than do
  most C libraries with a 'rand()' function, but not good enough
  for cryptographic purposes. The period of Python's Wichmann-Hill
  generator is about 7 trillion (7e13), but that merely indicates
  how long it will take a particular seeded generator to cycle; a
  different seed will produce a different sequence of numbers.
  Python 2.3 uses the superior Mersenne Twister generator, which
  has a longer period and have been better analyzed. For practical
  purposes, pseudo-random numbers generated by Python are more than
  adequate for random-seeming behavior in applications.

  The underlying pseudo-random numbers generated by the [random]
  module can be mapped into a variety of nonuniform patterns and
  distributions. Moreover, you can capture and tinker with the
  state of a pseudo-random generator; you can even subclass the
  `random.Random` class that operates behind the scenes. However,
  this latter sort of specialization is outside the scope of this
  book, and the class `random.Random`, and functions
  `random.getstate()`, `random.jumpahead()`, and `random.setstate()`
  are omitted from this discussion. The functions `random.whseed()`
  and `random.randint()` are deprecated.

  FUNCTIONS:

  random.betavariate(alpha, beta)
      Return a floating point value in the range [0.0, 1.0) with
      a beta distribution.

  random.choice(seq)
      Select a random element from the nonempty sequence 'seq'.

  random.cunifvariate(mean, arc)
      Return a floating point value in the range
      ['mean-arc/2', 'mean+arc/2') with a circular uniform
      distribution.  Arguments and result are expressed in
      radians.

  random.expovariate(lambda_)
      Return a floating point value in the range [0.0, +inf) with
      an exponential distribution.  The argument 'lambda_' gives
      the -inverse- of the mean of the distribution.

      >>> import random
      >>> t1,t2 = 0,0
      >>> for x in range(100):
      ...     t1 += random.expovariate(1./20)
      ...     t2 += random.expovariate(20.)
      ...
      >>> print t1/100, t2/100
      18.4021962198 0.0558234063338

  random.gamma(alpha, beta)
      Return a floating point value with a gamma distribution
      (not the gamma function).

  random.gauss(mu, sigma)
      Return a floating point value with a Gaussian
      distribution; the mean is 'mu' and the sigma is 'sigma'.
      `random.gauss()` is slightly faster than
      `random.normalvariate()`.

  random.lognormvariate(mu, sigma)
      Return a floating point value with a log normal
      distribution; the natural logarithm of this distribution is
      Gaussian with mean 'mu' and sigma 'sigma'.

  random.normalvariate(mu, sigma)
      Return a floating point value with a Gaussian distribution;
      the mean is 'mu' and the sigma is 'sigma'.

  random.paretovariate(alpha)
      Return a floating point value with a Pareto distribution.
      'alpha' specifies the shape parameter.

  random.random()
      Return a floating point value in the range [0.0, 1.0).

  random.randrange([start=0,] stop [,step=1])
      Return a random element from the specified range.
      Functionally equivalent to the expression
      'random.choice(range(start,stop,step))', but it does not
      build the actual range object.  Use `random.randrange()` in
      place of the deprecated `random.randint()`.

  random.seed([x=time.time()])
      Initialize the Wichmann-Hill generator.  You do not
      necessarily -need- to call `random.seed()`, since the
      current system time is used to initialize the generator
      upon module import.  But if you wish to provide more
      entropy in the initial state, you may pass any hashable
      object as argument 'x'.  Your best choice for 'x' is a
      positive long integer less than 27814431486575L, whose
      value is selected at random by independent means.

  random.shuffle(seq [,random=random.random])
      Permute the mutable sequence 'seq' in place.  An optional
      argument 'random' may be specified to use an alternate
      random generator, but it is unlikely you will want to use
      one.  Possible permutations get very big very quickly, so
      even for moderately sized sequences, not every permutation
      will occur.

  random.uniform(min, max)
      Return a random floating point value in the range
      [min, max).

  random.vonmisesvariate(mu, kappa)
      Return a floating point value with a von Mises
      distribution. 'mu' is the mean angle expressed in radians,
      and 'kappa' is the concentration parameter.

  random.weibullvariate(alpha, beta)
      Return a floating point value with a Weibull distribution.
      'alpha' is the scale parameter, and 'beta' is the shape
      parameter.

  =================================================================
    MODULE -- struct : Create and read packed binary strings
  =================================================================

  The [struct] module allows you to encode compactly Python
  numeric values.  This module may also be used to read C structs
  that use the same formats; some formatting codes are only
  useful for reading C structs.  The exception `struct.error` is
  raised if a format does not match its string or values.

  A format string consists of a sequence of alphabetic formatting
  codes. Each code is represented by zero or more bytes in the
  encoded packed binary string.  Each formatting code may be
  preceded by a number indicating a number of occurrences.  The
  entire format string may be preceded by a global flag.  If the
  flag '@' is used, platform-native data sizes and endianness are
  used.  In all other cases, standard data sizes are used.  The
  flag '=' explicitly indicates platform endianness; '<'
  indicates little-endian representations; '>' or '!' indicates
  big-endian representations.

  The available formatting codes are listed below. The standard
  sizes are given (check your platform for its sizes if
  platform-native sizes are needed).

      #------ Formatting codes for struct module -----#
      x       pad byte             0 bytes
      c       char                 1 bytes
      b       signed char          1 bytes
      B       unsigned char        1 bytes
      h       short int            2 bytes
      H       unsigned short       2 bytes
      i       int                  4 bytes
      I       unsigned int         4 bytes
      l       long int             4 bytes
      L       unsigned long        4 bytes
      q       long long int        8 bytes
      Q       unsigned long long   8 bytes
      f       float                4 bytes
      d       double               8 bytes
      s       string               padded to size
      p       Pascal string        padded to size
      P       char pointer         4 bytes

  Some usage examples clarify the encoding:

      >>> import struct
      >>> struct.pack('5s5p2c', 'sss','ppp','c','c')
      'sss\x00\x00\x03ppp\x00cc'
      >>> struct.pack('h', 1)
      '\x00\x01'
      >>> struct.pack('I', 1)
      '\x00\x00\x00\x01'
      >>> struct.pack('l', 1)
      '\x00\x00\x00\x01'
      >>> struct.pack('<l', 1)
      '\x01\x00\x00\x00'
      >>> struct.pack('f', 1)
      '?\x80\x00\x00'
      >>> struct.pack('hil', 1,2,3)
      '\x00\x01\x00\x00\x00\x00\x00\x02\x00\x00\x00\x03'

  FUNCTIONS:

  struct.calcsize(fmt)
      Return the length of the string that corresponds to the
      format 'fmt'.

  struct.pack(fmt, v1 [,v2 [...]])
      Return a string with values 'v1', et alia, packed according
      to the format 'fmt'.

  struct.unpack(fmt, s)
      Return a tuple of values represented by string 's' packed
      according to the format 'fmt'.

  =================================================================
    MODULE -- time : Functions to manipulate date/time values
  =================================================================

  The [time] module is useful both for computing and displaying
  dates and time increments, and for simple benchmarking of
  applications and functions.  For some purposes, eGenix.com's
  [mx.Date] module is more useful for manipulating datetimes than
  is [time].  You may obtain [mx.Date] from:

    <http://egenix.com/files/python/eGenix-mx-Extensions.html>

  Time tuples--used by several functions--consist of year, month,
  day, hour, minute, second, weekday, Julian day, and Daylight
  Savings flag.  All values are integers.  Month, day, and Julian
  day (day of year) are one-based; hour, minute, second, and
  weekday are zero-based (Monday is 0).  The Daylight Savings
  flag uses 1 for DST, 0 for Standard Time, and -1 for "best
  guess."

  CONSTANTS AND ATTRIBUTES:

  time.accept2dyear
      Boolean to allow two-digit years in date tuples.  Default
      is true value, in which case the first matching date since
      'time.gmtime(0)' is extrapolated.

      >>> import time
      >>> time.accept2dyear
      1
      >>> time.localtime(time.mktime((99,1,1,0,0,0,0,0,0)))
      (1999, 1, 1, 0, 0, 0, 4, 1, 0)
      >>> time.gmtime(0)
      (1970, 1, 1, 0, 0, 0, 3, 1, 0)

  time.altzone
  time.daylight
  time.timezone
  time.tzname
      These several constants show information on the current
      timezone.  Different locations use Daylight Savings
      adjustments during different portions of the year,
      usually but not always a one-hour adjustment.
      `time.daylight` indicates only whether such an adjustment
      is available in `time.altzone`.  `time.timezone`
      indicates how many seconds west of UTC the current zone
      is; `time.altzone` adjusts that for Daylight Savings if
      possible.  `time.tzname` gives a tuple of strings
      describing the current zone.

      >>> time.daylight, time.tzname
      (1, ('EST', 'EDT'))
      >>> time.altzone, time.timezone
      (14400, 18000)

  FUNCTIONS:

  time.asctime([tuple=time.localtime()])
      Return a string description of a time tuple.

      >>> time.asctime((2002, 10, 25, 1, 51, 48, 4, 298, 1))
      'Fri Oct 25 01:51:48 2002'

      SEE ALSO, `time.ctime()`, `time.strftime()`

  time.clock()
      Return the processor time for the current process.  The raw
      value returned has little inherent meaning, but the value
      is guaranteed to increase roughly in proportion to the
      amount of CPU time used by the process.  This makes
      `time.clock()` useful for comparative benchmarking of
      various operations or approaches.  The values returned
      should not be compared between different CPUs, OSs, and so
      on, but are meaningful on one machine.  For example:

      #*---------- Use of time.clock() for benchmarking --------#
      import time
      start1 = time.clock()
      approach_one()
      time1 = time.clock()-start1
      start2 = time.clock()
      approach_two()
      time2 = time.clock()-start2
      if time1 > time2:
          print "The second approach seems better"
      else:
          print "The first approach seems better"

      Always use `time.clock()` for benchmarking rather than
      `time.time()`.  The latter is a low-resolution "wall clock"
      only.

  time.ctime([seconds=time.time()])
      Return a string description of 'seconds' since epoch.

      >>> time.ctime(1035526125)
      'Fri Oct 25 02:08:45 2002'

      SEE ALSO, `time.asctime()`

  time.gmtime([seconds=time.time()])
      Return a time tuple of 'seconds' since epoch, giving
      Greenwich Mean Time.

      >>> time.gmtime(1035526125)
      (2002, 10, 25, 6, 8, 45, 4, 298, 0)

      SEE ALSO, `time.localtime()`

  time.localtime([seconds=time.time()])
      Return a time tuple of 'seconds' since epoch, giving
      the local time.

      >>> time.localtime(1035526125)
      (2002, 10, 25, 2, 8, 45, 4, 298, 1)

      SEE ALSO, `time.gmtime()`, `time.mktime()`

  time.mktime(tuple)
      Return a number of seconds since epoch corresponding to a
      time tuple.

      >>> time.mktime((2002, 10, 25, 2, 8, 45, 4, 298, 1))
      1035526125.0

      SEE ALSO, `time.localtime()`

  time.sleep(seconds)
      Suspend execution for approximately 'seconds' measured in
      "wall clock" time (not CPU time).  The argument 'seconds'
      is a floating point value (precision subject to system
      timer) and is fully thread safe.

  time.strftime(format [,tuple=time.localtime()])
      Return a custom string description of a time tuple.  The
      format given in the string 'format' may contain the
      following fields:  '%a'/'%A'/'%w' for abbreviated/full/decimal
      weekday name; '%b'/'%B'/'%m' for abbreviated/full/decimal
      month; '%y'/'%Y' for abbreviated/full year;  '%d' for
      day-of-month; '%H'/'%I' for 24/12 clock hour; '%j' for
      day-of-year; '%M' for minute; '%p' for AM/PM; '%S' for
      seconds; '%U'/'%W' for week-of-year (Sunday/Monday start);
      '%c'/'%x'/'%X' for locale-appropriate datetime/date/time;
      '%Z' for timezone name.  Other characters may occur in the
      format also and will appear as literals (a literal '%' can
      be escaped).

      >>> import time
      >>> tuple = (2002, 10, 25, 2, 8, 45, 4, 298, 1)
      >>> time.strftime("%A, %B %d '%y (week %U)", tuple)
      "Friday, October 25 '02 (week 42)"

      SEE ALSO, `time.asctime()`, `time.ctime()`, `time.strptime()`

  time.strptime(s [,format="%a %b %d %H:%M:%S %Y"])
      Return a time tuple based on a string description of a
      time.  The format given in the string 'format' follows the
      same rules as in `time.strftime()`.  Not available on most
      platforms.

      SEE ALSO, `time.strftime()`

  time.time()
      Return the number of seconds since the epoch for the
      current time.  You can specifically determine the epoch
      using 'time.ctime(0)', but normally you will use other
      functions in the [time] module to generate useful values.
      Even though `time.time()` is also generally nondecreasing
      in its return values, you should use `time.clock()` for
      benchmarking purposes

      >>> time.ctime(0)
      'Wed Dec 31 19:00:00 1969'
      >>> time.time()
      1035585490.484154
      >>> time.ctime(1035585437)
      'Fri Oct 25 18:37:17 2002'

      SEE ALSO, `time.clock()`, `time.ctime()`

  SEE ALSO, `calendar`

SECTION 3 -- Other Modules in the Standard Library
------------------------------------------------------------------------

  If your application performs other types of tasks besides text
  processing, a skim of this module list can suggest where to look
  for relevant functionality. As well, readers who find themselves
  maintaining code written by other developers may find that
  unfamiliar modules are imported by the existing code. If an
  imported module is not summarized in the list below, nor
  documented elsewhere, it is probably an in-house or third-party
  module. For standard library modules, the summaries here will at
  least give you a sense of the general purpose of a given module.

  __builtin__
      Access to built-in functions, exceptions, and other
      objects.  Python does a great job of exposing its own
      internals, but "normal" developers do not need to worry
      about this.

  TOPIC -- Serializing and Storing Python Objects
  --------------------------------------------------------------------

  In object-oriented programming (OOP) languages like Python,
  compound data and structured data is frequently represented at
  runtime as native objects. At times these objects belong to basic
  datatypes--lists, tuples, dictionaries--but more often, once you
  reach a certain degree of complexity, hierarchies of instances
  containing attributes become more likely.

  For simple objects, especially sequences, serialization and
  storage is rather straightforward.  For example, lists can
  easily be represented in delimited or fixed-length strings.
  Lists-of-lists can be saved in line oriented files, each line
  containing delimited fields, or in rows of RDBMS tables.  But
  once the dimension of nested sequences goes past two, and even
  more so for heterogeneous data structures, traditional
  table-oriented storage is a less-obvious fit.

  While it is -possible- to create "object/relational adaptors"
  that write OOP instances to flat tables, that usually requires
  custom programming. A number of more general solutions exist,
  both in the Python standard library and in third-party tools.
  There are actually two separate issues involved in storing Python
  objects. The first issue is how to convert them into strings in
  the first place; the second issue is how to create a general
  persistence mechanism for such serialized objects. At a minimal
  level, of course, it is simple enough to store (and retrieve) a
  serialization string the same way you would any other string--to
  a file, a database, and so on. The various [*dbm] modules create a
  "dictionary on disk," while the [shelve] module automatically
  utilizes [cPickle] serialization to write arbitrary objects as
  values (keys are still strings).

  Several third-party modules support object serialization with
  special features. If you need an XML dialect for your object
  representation, the modules [gnosis.xml.pickle] and [xmlrpclib]
  are useful. The YAML format is both human-readable/editable and
  has support libraries for Python, Perl, Ruby, and Java; using
  these various libraries, you can exchange objects between these
  several programming languages.

  SEE ALSO, `gnosis.xml.pickle`, `yaml`, `xmlrpclib`

  =================================================================
    MODULES -- DBM : Interfaces to dbm-style databases
  =================================================================

  A dbm-style database is a "dictionary on disk."  Using a
  database of this sort allows you to store a set of key/val
  pairs to a file, or files, on the local filesystem, and to
  access and set them as if they were an in-memory dictionary.  A
  dbm-style database, unlike a standard dictionary, always maps
  strings to strings.  If you need to store other types of
  objects, you will need to convert them to strings (or use the
  [shelve] module as a wrapper).

  Depending on your platform, and on which external libraries are
  installed, different dbm modules might be available. The
  performance characteristics of the various modules vary
  significantly. As well, some DBM modules support some special
  functionality. Most of the time, however, your best approach is
  to access the locally supported DBM module using the wrapper
  module [anydbm]. Calls to this module will select the best
  available DBM for the current environment without a programmer or
  user having to worry about the underlying support mechanism.

  Functions and methods are documents using the nonspecific
  capitalized form 'DBM'. In real usage, you would use the name of
  a specific module. Most of the time, you will get or set DBM
  values using standard named indexing, for example, 'db["key"]'. A
  few methods characteristic of dictionaries are also supported, as
  well as a few methods special to DBM databases.

  SEE ALSO, [shelve], [dict], [UserDict]

  FUNCTIONS:

  DBM.open(fname [,flag="r" [,mode=0666]])
      Open the filename 'fname' for dbm access.  The optional
      argument 'flag' specifies how the database is accessed.
      A value of 'r' is for read-only access (on an existing
      dbm file); 'w' opens an already existing file for
      read/write access; 'c' will create a database or use an
      existing one, with read/write access; the option 'n' will
      always create a new database, erasing the one named in
      'fname' if it already existed.  The optional 'mode'
      argument specifies the Unix mode of the file(s) created.

  METHODS:

  DBM.close()
      Close the database any flush and pending writes.

  DBM.first()
      Return the first key/val pair in the DBM.  The order is
      arbitrary but stable.  You may use the `DBM.first()` method,
      combined with repeated calls to `DBM.next()`, to process
      every item in the dictionary.

      In Python 2.2+, you can implement an 'items()' function to
      emulate the behavior of the '.items()' method of dictionaries
      for DBMs:

      >>> from __future__ import generators
      >>> def items(db):
      ...     try:
      ...         yield db.first()
      ...         while 1:
      ...             yield db.next()
      ...     except KeyError:
      ...         raise StopIteration
      ...
      >>> for k,v in items(d):   # typical usage
      ...     print k,v

  DBM.has_key(key)
      Return a true value if the DBM has the key 'key'.

  DBM.keys()
      Return a list of string keys in the DBM.

  DBM.last()
      Return the last key/val pair in the DBM.  The order is
      arbitrary but stable.  You may use the `DBM.last()` method,
      combined with repeated calls to `DBM.previous()`, to process
      every item in the dictionary in reverse order.

  DBM.next()
      Return the next key/val pair in the DBM.  A pointer to the
      current position is always maintained, so the methods
      `DBM.next()` and `DBM.previous()` can be used to access
      relative items.

  DBM.previous()
      Return the previous key/val pair in the DBM.  A pointer to
      the current position is always maintained, so the methods
      `DBM.next()` and `DBM.previous()` can be used to access
      relative items.

  DBM.sync()
      Force any pending data to be written to disk.

      SEE ALSO, `FILE.flush()`

  MODULES:

  anydbm
      Generic interface to underlying DBM support.  Calls to this
      module use the functionality of the "best available" DBM
      module.  If you open an existing database file, its type is
      guessed and used--assuming the current machine supports
      that style.

      SEE ALSO, `whichdb`

  bsddb
      Interface to the Berkeley DB library.

  dbhash
      Interface to the BSD DB library.

  dbm
      Interface to the Unix (n)dbm library.

  dumbdbm
      Interface to slow, but portable pure Python DBM.

  gdbm
      Interface to the GNU DBM (GDBM) library.

  whichdb
      Guess which db package to use to open a db file.  This
      module contains the single function `whichdb.whichdb()`.
      If you open an existing DBM file with [anydbm], this
      function is called automatically behind the scenes.

  SEE ALSO, [shelve]

  =================================================================
    MODULE -- cPickle : Fast Python object serialization

  =================================================================
    MODULE -- pickle : Standard Python object serialization
  =================================================================

  The module [cPickle] is a comparatively fast C implementation of
  the pure Python [pickle] module. The streams produced and read by
  [cPickle] and [pickle] are interchangeable. The only time you
  should prefer [pickle] is in the uncommon case where you wish to
  subclass the pickling base class; [cPickle] is many times faster
  to use.  The class `pickle.Pickler` is not documented here.

  The [cPickle] and [pickle] modules support a both binary and an
  ASCII format. Neither is designed for human readability, but it
  is not hugely difficult to read an ASCII pickle.  Nonetheless,
  if readability is a goal, [yaml] or [gnosis.xml.pickle] are
  better choices.  Binary format produces smaller pickles that
  are faster to write or load.

  It is possible to fine-tune the pickling behavior of objects by
  defining the methods '.__getstate__()', '.__setstate__()', and
  '.__getinitargs__()'. The particular black magic invocations
  involved in defining these methods, however, are not addressed in
  this book and are rarely necessary for "normal" objects (i.e.,
  those that represent data structures).

  Use of the [cPickle] or [pickle] module is quite simple:

      >>> import cPickle
      >>> from somewhere import my_complex_object
      >>> s = cPickle.dumps(my_complex_object)
      >>> new_obj = cPickle.loads(s)

  FUNCTIONS:

  pickle.dump(o, file [,bin=0])
  cPickle.dump(o, file [,bin=0])
      Write a serialized form of the object 'o' to the file-like
      object 'file'.  If the optional argument 'bin' is given a true
      value, use binary format.

  pickle.dumps(o [,bin=0])
  cPickle.dumps(o [,bin=0])
      Return a serialized form of the object 'o' as a string.  If
      the optional argument 'bin' is given a true value, use binary
      format.

  pickle.load(file)
  cPickle.load(file)
      Return an object that was serialized as the contents of the
      file-like object 'file'.

  pickle.loads(s)
  cPickle.load(s)
      Return an object that was serialized in the string 's'.

  SEE ALSO, `gnosis.xml.pickle`, `yaml`

  marshal
      Internal Python object serialization.  For more general
      object serialization, use [pickle], [cPickle], or
      [gnosis.xml.pickle], or the YAML tools at
      <http://yaml.org>; [marshal] is a limited-purpose
      serialization to the pseudo-compiled byte-code format used
      by Python '.pyc' files.

  =================================================================
    MODULE -- pprint : Pretty-print basic datatypes
  =================================================================

  The module [pprint] is similar to the built-in function `repr()`
  and the module [repr]. The purpose of [pprint] is to represent
  objects of basic datatypes in a more readable fashion, especially
  in cases where collection types nest inside each other. In simple
  cases `pprint.pformat` and `repr()` produce the same result; for
  more complex objects, [pprint] uses newlines and indentation to
  illustrate the structure of a collection. Where possible, the
  string representation produced by [pprint] functions can be used
  to re-create objects with the built-in `eval()`.

  I find the module [pprint] somewhat limited in that it does not
  produce a particularly helpful representation of objects of
  custom types, which might themselves represent compound data.
  Instance attributes are very frequently used in a manner
  similar to dictionary keys.  For example:

      >>> import pprint
      >>> dct = {1.7:2.5, ('t','u','p'):['l','i','s','t']}
      >>> dct2 = {'this':'that', 'num':38, 'dct':dct}
      >>> class Container: pass
      ...
      >>> inst = Container()
      >>> inst.this, inst.num, inst.dct = 'that', 38, dct
      >>> pprint.pprint(dct2)
      {'dct': {('t', 'u', 'p'): ['l', 'i', 's', 't'], 1.7: 2.5},
       'num': 38,
       'this': 'that'}
      >>> pprint.pprint(inst)
      <__main__.Container instance at 0x415770>

  In the example, 'dct2' and 'inst' have the same structure, and
  either might plausibly be chosen in an application as a data
  container. But the latter [pprint] representation only tells us
  the barest information about -what- an object is, not what data
  it contains. The mini-module below enhances pretty-printing:

      #--------------------- pprint2.py ------------------------#
      from pprint import pformat
      import string, sys
      def pformat2(o):
          if hasattr(o,'__dict__'):
              lines = []
              klass = o.__class__.__name__
              module = o.__module__
              desc = '<%s.%s instance at 0x%x>' % (module, klass, id(o))
              lines.append(desc)
              for k,v in o.__dict__.items():
                  lines.append('instance.%s=%s' % (k, pformat(v)))
              return string.join(lines,'\n')
          else:
              return pprint.pformat(o)

      def pprint2(o, stream=sys.stdout):
          stream.write(pformat2(o)+'\n')

  Continuing the session above, we get a more useful report:

      >>> import pprint2
      >>> pprint2.pprint2(inst)
      <__main__.Container instance at 0x415770>
      instance.this='that'
      instance.dct={('t', 'u', 'p'): ['l', 'i', 's', 't'], 1.7: 2.5}
      instance.num=38

  FUNCTIONS:

  pprint.isreadable(o)
      Return a true value if the equality below holds:

      #*------------ Round-tripping with pprint ----------------#
      o == eval(pprint.pformat(o))

  pprint.isrecursive(o)
      Return a true value if the object 'o' contains recursive
      containers.  Objects that contain themselves at any nested
      level cannot be restored with `eval()`.

  pprint.pformat(o)
      Return a formatted string representation of the object 'o'.

  pprint.pprint(o [,stream=sys.stdout])
      Print the formatted representation of the object 'o' to the
      file-like object 'stream'.

  CLASSES:

  pprint.PrettyPrinter(width=80, depth=..., indent=1, stream=sys.stdout)
      Return a pretty-printing object that will format using a
      width of 'width', will limit recursion to depth 'depth',
      and will indent each new level by 'indent' spaces.  The
      method `pprint.PrettyPrinter.pprint()` will write to the
      file-like object 'stream'.

        >>> pp = pprint.PrettyPrinter(width=30)
        >>> pp.pprint(dct2)
        {'dct': {1.7: 2.5,
                 ('t', 'u', 'p'): ['l',
                                   'i',
                                   's',
                                   't']},
         'num': 38,
         'this': 'that'}

  METHODS:

  The class `pprint.PrettyPrinter` has the same methods as the
  module level functions. The only difference is that the stream
  used for `pprint.PrettyPrinter.pprint()` is configured when an
  instance is initialized rather than passed as an optional
  argument.

  SEE ALSO, `gnosis.xml.pickle`, `yaml`

  =================================================================
    MODULE -- repr : Alternative object representation
  =================================================================

  The module [repr] contains code for customizing the string
  representation of objects. In its default behavior the function
  `repr.repr()` provides a length-limited string representation of
  objects--in the case of large collections, displaying the entire
  collection can be unwieldy, and unnecessary for merely
  distinguishing objects.  For example:

      >>> dct = dict([(n,str(n)) for n in range(6)])
      >>> repr(dct)     # much worse for, e.g., 1000 item dict
      "{0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5'}"
      >>> from repr import repr
      >>> repr(dct)
      "{0: '0', 1: '1', 2: '2', 3: '3', ...}"
      >>> `dct`
      "{0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5'}"

  The back-tick operator does not change behavior if the built-in
  `repr()` function is replaced.

  Your can change the behavior of the `repr.repr()` by modifying
  attributes of the instance object `repr.aRepr`.

      >>> dct = dict([(n,str(n)) for n in range(6)])
      >>> repr(dct)
      "{0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5'}"
      >>> import repr
      >>> repr.repr(dct)
      "{0: '0', 1: '1', 2: '2', 3: '3', ...}"
      >>> repr.aRepr.maxdict = 5
      >>> repr.repr(dct)
      "{0: '0', 1: '1', 2: '2', 3: '3', 4: '4', ...}"

  In my opinion, the choice of the name for this module is
  unfortunate, since it is identical to that of the built-in
  function.  You can avoid some of the collision by using the
  'as' form of importing, as in:

      >>> import repr as _repr
      >>> from repr import repr as newrepr

  For fine-tuned control of object representation, you may subclass
  the class `repr.Repr`. Potentially, you could use substitutable
  'repr()' functions to change the behavior of application output,
  but if you anticipate such a need, it is better practice to give
  a name that indicates this, for example, 'overridable_repr()'.

  CLASSES:

  repr.Repr()
      Base for customized object representations.  The instance
      `repr.aRepr` automatically exists in the module namespace,
      so this class is useful primarily as a parent class.  To
      change an attribute, it is simplest just to set it in an
      instance.

  ATTRIBUTES:

  repr.maxlevel
      Depth of recursive objects to follow.

  repr.maxdict
  repr.maxlist
  repr.maxtuple
      Number of items in a collection of the indicated type to
      include in the representation.  Sequences default to 6,
      dicts to 4.

  repr.maxlong
      Number of digits of a long integer to stringify.  Default
      is 40.

  repr.maxstring
      Length of string representation (e.g., 's[:N]').  Default
      is 30.

  repr.maxother
      "Catch-all" maximum length of other representations.

  FUNCTIONS:

  repr.repr(o)
      Behaves like built-in `repr()`, but potentially with a
      different string representation created.

  repr.repr_TYPE(o, level)
      Represent an object of the type 'TYPE', where the names
      used are the standard type names.  The argument 'level'
      indicates the level of recursion when this method is called
      (you might want to decide what to print based on how deep
      within the representation the object is).  The _Python
      Library Reference_ gives the example:

      #*--------------- Custom Repr class ---------------------#
      class MyRepr(repr.Repr):
          def repr_file(self, obj, level):
              if obj.name in ['<stdin>', '<stdout>', '<stderr>']:
                  return obj.name
              else:
                  return `obj`
      aRepr = MyRepr()
      print aRepr.repr(sys.stdin)          # prints '<stdin>'

  =================================================================
    MODULE -- shelve : General persistent dictionary
  =================================================================

  The module [shelve] builds on the capabilities of the DBM
  modules, but takes things a step forward.  Unlike with the DBM
  modules, you may write arbitrary Python objects as values in a
  [shelve] database.  The keys in [shelve] databases, however,
  must still be strings.

  The methods of [shelve] databases are generally the same as those
  for their underlying DBMs. However, shelves do not have the
  '.first()', '.last()', '.next()', or '.previous()' methods; nor
  do they have the '.items()' method that actual dictionaries do.
  Most of the time you will simply use name-indexed assignment and
  access. But from time to time, the available `shelve.get()`,
  `shelve.keys()`, `shelve.sync()`, `shelve.has_key()`, and
  `shelve.close()` methods are useful.

  Usage of a shelve consists of a few simple steps like the
  ones below:

      >>> import shelve
      >>> sh = shelve.open('test_shelve')
      >>> sh.keys()
      ['this']
      >>> sh['new_key'] = {1:2, 3:4, ('t','u','p'):['l','i','s','t']}
      >>> sh.keys()
      ['this', 'new_key']
      >>> sh['new_key']
      {1: 2, 3: 4, ('t', 'u', 'p'): ['l', 'i', 's', 't']}
      >>> del sh['this']
      >>> sh.keys()
      ['new_key']
      >>> sh.close()

  In the example, I opened an existing shelve, and the previously
  existing key/value pair was available. Deleting a key/value pair
  is the same as doing so from a standard dictionary. Opening a new
  shelve automatically creates the necessary file(s).

  Although [shelve] only allows strings to be used as keys, in a
  pinch it is not difficult to generate strings that characterize
  other types of immutable objects.  For the same reasons that you
  do not generally want to use mutable objects as dictionary
  keys, it is also a bad idea to use mutable objects as [shelve]
  keys.  Using the built-in `hash()` method is a good way to
  generate strings--but keep in mind that this technique does not
  strictly guarantee uniqueness, so it is possible (but unlikely)
  to accidentally overwrite entries using this hack:

      >>> '%x' % hash((1,2,3,4,5))
      '866123f4'
      >>> '%x' % hash(3.1415)
      '6aad0902'
      >>> '%x' % hash(38)
      '26'
      >>> '%x' % hash('38')
      '92bb58e3'

  Integers, notice, are their own hash, and strings of digits are
  common.  Therefore, if you adopted this approach, you would
  want to hash strings as well, before using them as keys.  There
  is no real problem with doing so, merely an extra indirection
  step that you need to remember to use consistently:

      >>> sh['%x' % hash('another_key')] = 'another value'
      >>> sh.keys()
      ['new_key', '8f9ef0ca']
      >>> sh['%x' % hash('another_key')]
      'another value'
      >>> sh['another_key']
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        File "/sw/lib/python2.2/shelve.py", line 70, in __getitem__
          f = StringIO(self.dict[key])
      KeyError: another_key

  If you want to go beyond the capabilities of [shelve] in several
  ways, you might want to investigate the third-party library Zope
  Object Database (ZODB). ZODB allows arbitrary objects to be
  persistent, not only dictionary-like objects. Moreover, ZODB lets
  you store data in ways other than in local files, and also has
  adaptors for multiuser simultaneous access. Look for details at:

    <http://www.zope.org/Wikis/ZODB/StandaloneZODB>

  SEE ALSO, [DBM], [dict]

  -*-

  The rest of the listed modules are comparatively unlikely to be
  needed in text processing applications. Some modules are specific
  to a particular platform; if so, this is indicated
  parenthetically. Recent distributions of Python have taken a
  "batteries included" approach--much more is included in a base
  Python distribution than is with other free programming languages
  (but other popular languages still have a range of existing
  libraries that can be downloaded separately).


  TOPIC -- Platform-Specific Operations
  --------------------------------------------------------------------

  _winreg
      Access to the Windows registry (Windows).

  AE
      AppleEvents (Macintosh; replaced by [Carbon.AE]).

  aepack
      Conversion between Python variables and AppleEvent data
      containers (Macintosh).

  aetypes
      AppleEvent objects (Macintosh).

  applesingle
      Rudimentary decoder for AppleSingle format files
      (Macintosh).

  buildtools
      Build MacOS applets (Macintosh).

  calendar
      Print calendars, much like the Unix 'cal' utility.  A
      variety of functions allow you to print or stringify
      calendars for various time frames.  For example,

      >>> print calendar.month(2002,11)
          November 2002
      Mo Tu We Th Fr Sa Su
                   1  2  3
       4  5  6  7  8  9 10
      11 12 13 14 15 16 17
      18 19 20 21 22 23 24
      25 26 27 28 29 30

  Carbon.AE, Carbon.App, Carbon.CF, Carbon.Cm, Carbon.Ctl, Carbon.Dlg,
  Carbon.Evt, Carbon.Fm, Carbon.Help, Carbon.List, Carbon.Menu, Carbon.Mlte,
  Carbon.Qd, Carbon.Qdoffs, Carbon.Qt, Carbon.Res, Carbon.Scrap, Carbon.Snd,
  Carbon.TE, Carbon.Win
      Interfaces to Carbon API (Macintosh).

  cd
      CD-ROM access on SGI systems (IRIX).

  cfmfile
      Code Fragment Resource module (Macintosh).

  ColorPicker
      Interface to the standard color selection dialog
      (Macintosh).

  ctb
      Interface to the Communications Tool Box (Macintosh).

  dl
      Call C functions in shared objects (Unix).

  EasyDialogs
      Basic Macintosh dialogs (Macintosh).

  fcntl
      Access to Unix 'fcntl()' and 'iocntl()' system functions
      (Unix).

  findertools
      AppleEvents interface to MacOS finder (Macintosh).

  fl, FL, flp
      Functions and constants for working with the FORMS library
      (IRIX).

  fm, FM
      Functions and constants for working with the Font Manager
      library (IRIX).

  fpectl
      Floating point exception control (Unix).

  FrameWork, MiniAEFrame
      Structured development of MacOS applications (Macintosh).

  gettext
      The module [gettext] eases the development of multilingual
      applications.  While actual translations must be performed
      manually, this module aids in identifying strings for
      translation and runtime substitutions of language-specific
      strings.

  grp
      Information on Unix groups (Unix).

  locale
      Control the language and regional settings for an
      application.  The 'locale' setting affects the behavior of
      several functions, such as `time.strftime()` and
      `string.lower()`.  The [locale] module is also useful for
      creating strings such as number with grouped digits and
      currency strings for specific nations.

  mac, macerrors, macpath
      Macintosh implementation of [os] module functionality.  It
      is generally better to use [os] directly and let it call
      [mac] where needed (Macintosh).

  macfs, macfsn, macostools
      File system services (Macintosh).

  MacOS
      Access to MacOS Python interpreter (Macintosh).

  macresource
      Locate script resources (Macintosh).

  macspeech
      Interface to Speech Manager (Macintosh).

  mactty
      Easy access serial to line connections (Macintosh).

  mkcwproject
      Create CodeWarrior projects (Macintosh).

  msvcrt
      Miscellaneous Windows-specific functions provided in
      Microsoft's Visual C++ Runtime libraries (Windows).

  Nac
      Interface to Navigation Services (Macintosh).

  nis
      Access to Sun's NIS Yellow Pages (Unix).

  pipes
      Manage pipes at a finer level than done by `os.popen()` and
      its relatives.  Reliability varies between platforms (Unix).

  PixMapWrapper
      Wrap PixMap objects (Macintosh).

  posix, posixfile
      Access to operating system functionality under Unix.  The
      [os] module provides more portable version of the same
      functionality and should be used instead (Unix).

  preferences
      Application preferences manager (Macintosh).

  pty
      Pseudo terminal utilities (IRIX, Linux).

  pwd
      Access to Unix password database (Unix).

  pythonprefs
      Preferences manager for Python (Macintosh).

  py_resource
      Helper to create PYC resources for compiled applications
      (Macintosh).

  quietconsole
      Buffered, nonvisible STDOUT output (Macintosh).

  resource
      Examine resource usage (Unix).

  syslog
      Interface to Unix syslog library (Unix).

  tty, termios, TERMIOS
      POSIX tty control (Unix).

  W
      Widgets for the Mac (Macintosh).

  waste
      Interface to the WorldScript-Aware Styled Text Engine
      (Macintosh).

  winsound
      Interface to audio hardware under Windows (Windows).

  xdrlib
      Implements (a subset of) Sun eXternal Data Representation
      (XDR).  In concept, [xdrlib] is similar to the [struct]
      module, but the format is less widely used.


  TOPIC -- Working with Multimedia Formats
  --------------------------------------------------------------------

  aifc
      Read and write AIFC and AIFF audio files.  The interface to
      [aifc] is the same as for the [sunau] and [wave] modules.

  al, AL
      Audio functions for SGI (IRIX).

  audioop
      Manipulate raw audio data.

  chunk
      Read chunks of IFF audio data.

  colorsys
      Convert between RGB color model and YIQ, HLS, and HSV color
      spaces.

  gl, DEVICE, GL
      Functions and constants for working with Silicon Graphics'
      Graphics Library (IRIX).

  imageop
      Manipulate image data stored as Python strings.  For most
      operations on image files, the third-party -Python Imaging
      Library- (<http://www.pythonware.com/products/pil/>) is a
      versatile and powerful tool.

  imgfile
      Support for imglib files (IRIX).

  jpeg
      Read and write JPEG files on SGI (IRIX).  The -Python
      Imaging Library- (<http://www.pythonware.com/products/pil/>)
      provides a cross-platform means of working with a large
      number of image formats and is preferable for most
      purposes.

  rgbimg
      Read and write SGI RGB files (IRIX).

  sunau
      Read and write Sun AU audio files.  The interface to
      [sunau] is the same as for the [aifc] and [wave] modules.

  sunaudiodev, SUNAUDIODEV
      Interface to Sun audio hardware (SunOS/Solaris).

  videoreader
      Read QuickTime movies frame by frame (Macintosh).

  wave
      Read and write WAV audio files.  The interface to [wave] is
      the same as for the [aifc] and [sunau] modules


  TOPIC -- Miscellaneous Other Modules
  --------------------------------------------------------------------

  array
      Typed arrays of numeric values.  More efficient than
      standard Python lists, where applicable.

  atexit
      Exit handlers.  Same functionality as `sys.exitfunc`, but
      different interface.

  BaseHTTPServer, SimpleHTTPServer, SimpleXMLRPCServer, CGIHTTPServer
      HTTP server classes.  [BaseHTTPServer] should usually be
      treated as an abstract class.  The other modules provide
      sufficient customization for usage in the specific context
      indicated by their names.  All may be customized for your
      application's needs.

  Bastion
      Restricted object access.  Used in conjunction with
      [rexec].

  bisect
      List insertion maintaining sort order.

  cmath
      Mathematical functions over complex numbers.

  cmd
      Build line-oriented command interpreters.

  code
      Utilities to emulate Python's interactive interpreter.

  codeop
      Compile possibly incomplete Python source code.

  compileall
      Module/script to compile .py files to cached byte-code
      files.

  compile, compile.ast, compile.visitor
      Analyze Python source code and generate Python byte-codes.

  copy_reg
      Helper to provide extensibility for pickle/cPickle.

  curses, curses.ascii, curses.panel, curses.textpad, curses.wrapper
      Full-screen terminal handling with the (n)curses library.

  dircache
      Cached directory listing.  This module enhances the
      functionality of `os.listdir()`.

  dis
      Disassembler of Python byte-code into mnemonics.

  distutils
      Build and install Python modules and packages.  [distutils]
      provides a standard mechanism for creating distribution
      packages of Python tools and libraries, and also for
      installing them on target machines.  Although [distutils]
      is likely to be useful for text processing applications
      that are distributed to users, a discussion of the details
      of working with [distutils] is outside the scope of this
      book.  Useful information can be found in the Python
      standard documentation, especially Greg Ward's
      _Distributing Python Modules_ and _Installing Python
      Modules_.

  doctest
      Check the accuracy of __doc__ strings.

  errno
      Standard 'errno' system symbols.

  fpformat
      General floating point formatting functions.  Duplicates
      string interpolation functionality.

  gc
      Control Python's (optional) cyclic garbage collection.

  getpass
      Utilities to collect a password without echoing to screen.

  imp
      Access the internals of the  'import' statement.

  inspect
      Get useful information from live Python objects for Python
      2.1+.

  keyword
      Check whether string is a Python keyword.

  math
      Various trigonometric and algebraic functions and constants.
      These functions generally operate on floating point
      numbers--use [cmath] for calculations on complex numbers.

  mutex
      Work with mutual exclusion locks, typically for threaded
      applications.

  new
      Create special Python objects in customizable ways.  For
      example, Python hackers can create a module object without
      using a file of the same name or create an instance while
      bypassing the normal '.__init__()' call.  "Normal"
      techniques generally suffice for text processing
      applications.

  pdb
      A Python debugger.

  popen2
      Functions to spawn commands with pipes to STDIN, STDOUT,
      and optionally STDERR.  In Python 2.0+, this functionality
      is copied to the [os] module in slightly improved form.
      Generally you should use the [os] module (unless you are
      running Python 1.52 or earlier).

  profile
      Profile the performance characteristics of Python code.  If
      speed becomes an issue in your application, your first step
      in solving any problem issues should be profiling the code.
      But details of using [profile] are outside the scope of
      this book.  Moreover, it is usually a bad idea to -assume-
      speed is a problem until it is actually found to be so.

  pstats
      Print reports on profiled Python code.

  pyclbr
      Python class browser; useful for implementing code
      development environments for editing Python.

  pydoc
      Extremely useful script and module for examining Python
      documentation.  [pydoc] is included with Python 2.1+, but
      is compatible with earlier versions if downloaded.  [pydoc]
      can provide help similar to Unix 'man' pages, help in the
      interactive shell, and also a Web browser interface to
      documentation.  This tool is worth using frequently while
      developing Python applications, but its details are outside
      the scope of this book.

  py_compile
      "Compile" a .py file to a .pyc (or .pyo) file.

  Queue
      A multiproducer, multiconsumer queue, especially for
      threaded programming.

  readline, rlcompleter
      Interface to GNU readline (Unix).

  rexec
      Restricted execution facilities.

  sched
      General event scheduler.

  signal
      Handlers for asynchronous events.

  site, user
      Customizable startup module that can be modified to change
      the behavior of the local Python installation.

  statcache
      Maintain a cache of `os.stat()` information on files.
      Deprecated in Python 2.2+.

  statvfs
      Constants for interpreting the results of `os.statvfs()`
      and `os.fstatvfs()`.

  thread, threading
      Create multithreaded applications with Python.  Although
      text processing applications--like other
      applications--might use a threaded approach, this topic is
      outside the scope of this book.  Most, but not all, Python
      platforms support threaded applications.

  Tkinter, ScrolledText, Tix, turtle
      Python interface to TCL/Tk and higher-level widgets for
      TK. Supported on many platforms, but not on all Python
      installations.

  traceback
      Extract, format, and print information about Python stack
      traces.  Useful for debugging applications.

  unittest
      Unit testing framework.  Like a number of other
      documenting, testing, and debugging modules, [unittest] is
      a  useful facility--and its usage is recommended for Python
      applications in general.  But this module is not specific
      enough to text processing applications to be addressed in
      this book.

  warnings
      Python 2.1 added a set of warning messages for conditions a
      user should be aware of, but that fall below the threshold
      for raising exceptions.  By default, such messages are
      printed to STDERR, but the [warning] module can be used to
      modify the behavior of warning messages.

  weakref
      Create references to objects that do not limit garbage
      collection.  At first brush, weak references seem strange,
      and the strangeness does not really go away quickly.  If
      you do not know why you would want to use these, do not
      worry about it--you do not need to.

  whrandom
      Wichmann-Hill random number generator.  Deprecated since
      Python 2.1, and not necessary to use directly before
      that--use the module [random] to create pseudo-random
      values.

